Search
๐Ÿ

Come2us #3 - Infra & DevOps BluePrint

Date
2025/11/30
Category
Goorm
Tag
AWS
Kubernetes
CI/CD
Monitoring
ย Part: Infra & DevOps
ย ์ž‘์„ฑ์ž: ์ด์šฐ์„(์„ธ๋ถ€ ๋‚ด์šฉ ์ž‘์„ฑ), ํ™๋ชจ์„ธ(๋‹ค์ด์–ด๊ทธ๋žจ ๊ตฌ์„ฑ)
๋ชฉ์ฐจ

1. ๊ฐœ์š”

โ€ข
ํ”„๋กœ์ ํŠธ๋ช…: COME2US (Sprint #3)
โ€ข
๋ชฉํ‘œ:
โ—ฆ
ECS ๊ธฐ๋ฐ˜ 2์ฐจ ์•„ํ‚คํ…์ฒ˜๋ฅผ Kubernetes(EKS) ๊ธฐ๋ฐ˜์œผ๋กœ ํ™•์žฅ
โ—ฆ
GitOps ๊ธฐ๋ฐ˜ ์„ ์–ธ์  ๋ฐฐํฌ ๊ตฌ์กฐ ์ •๋ฆฝ (GitHub Actions + ArgoCD)
โ—ฆ
์„œ๋น„์Šค ๋ฉ”์‹œ์— Istio ๋„์ž…ํ•˜์—ฌ ๋ณด์•ˆ(mTLS)ยทํŠธ๋ž˜ํ”ฝ ์ œ์–ดยท๊ด€์ฐฐ์„ฑ ํ™•๋ฆฝ
โ—ฆ
Helm ๊ธฐ๋ฐ˜ Kubernetes Manifest ํ…œํ”Œ๋ฆฟํ™”
โ—ฆ
Kafka(MSK) ๊ธฐ๋ฐ˜ ์ด๋ฒคํŠธ ์‹œ์Šคํ…œ์„ ๊ฒฐ์ œ(Payment) ์„œ๋น„์Šค ์ค‘์‹ฌ์œผ๋กœ ๋ถ€๋ถ„ ๋„์ž…
โ—ฆ
์ธํ”„๋ผ ์ž๋™ํ™”๋Š” Terraform ์ค‘์‹ฌ์œผ๋กœ ์œ ์ง€
โ—ฆ
Istio๋ฅผ ํ™œ์šฉํ•œ Canary ๋ฐฐํฌ ์ „๋žต ๋„์ž…
โ—ฆ
Karpenter ๊ธฐ๋ฐ˜ ์ž๋™ ๋…ธ๋“œ ํ”„๋กœ๋น„์ €๋‹ ๋„์ž…
โ€ข
ํ•ต์‹ฌ ๊ตฌ์„ฑ ๊ณ„์ธต
๊ณ„์ธต
์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ
Network Layer
VPC, Subnet(Public/App/Data), NAT, LB, Route53
Compute Layer
EKS, NodeGroup, Karpenter
Service Layer
Spring Boot Microservices, Istio, Envoy Sidecar, Helm
Data Layer
RDS(PostgreSQL), Redis, MSK(Kafka), S3
DevOps Layer
GitHub Actions(CI), ArgoCD(CD), Terraform, Helm
Observability Layer
CloudWatch, Prometheus, Grafana, Loki, Tempo(Tracing)

2. ์‹œ์Šคํ…œ ์•„ํ‚คํ…์ฒ˜ ์š”์•ฝ

์•„ํ‚คํ…์ฒ˜ ๋‹ค์ด์–ด๊ทธ๋žจ

์ฃผ์š” ๋ณ€๊ฒฝ์  ์š”์•ฝ (ECS โ†’ EKS ์ „ํ™˜)

์˜์—ญ
2์ฐจ ํ”„๋กœ์ ํŠธ (๊ธฐ์กด)
3์ฐจ ํ”„๋กœ์ ํŠธ (๋ณ€๊ฒฝ ๋ฒ„์ „)
์ปจํ…Œ์ด๋„ˆ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜
ECS Fargate
EKS + Karpenter
๋ฐฐํฌ ์ฒด๊ณ„
Jenkins + Terraform
GitHub Actions(CI) + ArgoCD(CD)
์„œ๋น„์Šค ๋””์Šค์ปค๋ฒ„๋ฆฌ
Eureka
K8s Service + Istio Mesh
ํŠธ๋ž˜ํ”ฝ ๋ผ์šฐํŒ…
Spring Cloud Gateway
Istio Ingress Gateway + Envoy
App ์„ค์ • ๊ด€๋ฆฌ
Config Server
Helm Values + ConfigMap/Secret
์ด๋ฒคํŠธ ์‹œ์Šคํ…œ
-
Kafka(MSK) ๋„์ž… (๊ฒฐ์ œ ์„œ๋น„์Šค, ๋กœ๊ทธ ํŒŒ์ดํ”„๋ผ์ธ)
๊ด€์ฐฐ์„ฑ
CloudWatch ์ค‘์‹ฌ
Prometheus + Grafana + Loki + Tempo
๋ฐฐํฌ ์ „๋žต
Blue/Green
Canary Deployment (Istio ๊ธฐ๋ฐ˜)

3. ๋„คํŠธ์›Œํฌ ๊ตฌ์กฐ

โ€ข
2์ฐจ ํ”„๋กœ์ ํŠธ์™€ ๋™์ผํ•œ 2-AZ ๊ตฌ์„ฑ
โ€ข
Public, Private(Application), Private(Database) Subnet์„ ๋ถ„๋ฆฌํ•˜์—ฌ ํŠธ๋ž˜ํ”ฝ ๋ฐ ๋ณด์•ˆ ๊ณ„์ธตํ™”
โ—ฆ
๊ธฐ์กด๊ณผ ๋™์ผ
โ€ข
Istio Ingress Gateway๋Š” LoadBalancer Service์ด๋ฉฐ, AWS NLB ์ƒ์„ฑ
โ€ข
๋ชจ๋“  ์„œ๋น„์Šค ๊ฐ„ ํ†ต์‹ ์€ Envoy Sidecar๋ฅผ ํ†ตํ•ด ์ฒ˜๋ฆฌ
๊ตฌ๋ถ„
์—ญํ• 
๊ตฌ์„ฑ ๊ณ„ํš
Public Subnet
์™ธ๋ถ€ ๋…ธ์ถœ์šฉ ๋ฆฌ์†Œ์Šค
- ALB(Grafana, ArgoCD) - NLB(Istio Ingress Gateway) - NAT
Private Subnet(Application)
๋‚ด๋ถ€ ์„œ๋น„์Šค
EKS Worker Node
Private Subnet(Database)
๋ฐ์ดํ„ฐ ๋ฆฌ์†Œ์Šค
RDS, Redis, MSK
Routing
IGW / NAT
Outbound โ†’ NAT
DNS
Route53 + ACM
HTTPS ํ†ต์‹  ๊ธฐ๋ฐ˜ ์ธ์ฆ์„œ ๊ด€๋ฆฌ

4. ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ณ„์ธต

๊ตฌ์„ฑ ์š”์†Œ

์š”์†Œ
์„ค๋ช…
EKS Cluster
๋ชจ๋“  MSA ์„œ๋น„์Šค์˜ ๋Ÿฐํƒ€์ž„ ํ™˜๊ฒฝ
NodeGroup + Karpenter Autoscaling
On-demand ์ตœ์†Œ ๊ตฌ์„ฑ + Karpenter ํ†ตํ•œ Autoscaling
Pod / Deployment / Service
๋‚ด๋ถ€ ์„œ๋น„์Šค๋“ค์€ ClusterIP๋ฅผ ํ†ตํ•ด ๋…ธ์ถœ
Istio (Service Mesh)
mTLS, Traffic Splitting, Routing, Circuit Breaker
Envoy Sidecar
Pod ๊ฐ„ ํ†ต์‹  ํ”„๋ก์‹œ
Helm Charts
Microservice ํ…œํ”Œ๋ฆฟ ๊ด€๋ฆฌ
ArgoCD
GitOps ๊ธฐ๋ฐ˜ ์ง€์†์  ๋ฐฐํฌ(CD)

์„œ๋น„์Šค ํ๋ฆ„

Client โ†’ Route53 โ†’ Istio Ingress Gateway โ†’ Envoy Sidecar โ†’ K8s Service โ†’ Microservice Pod โ†’ RDS / Redis / MSK(Kafka) โ†’ Envoy Sidecar โ†’ Istio Ingress Gateway โ†’ Client ์‘๋‹ต
Plain Text
๋ณต์‚ฌ

5. ๋ฐ์ดํ„ฐ ๊ณ„์ธต

โ€ข
๊ธฐ์กด๊ณผ ๋™์ผํ•œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์œ ์ง€ํ•˜๋ฉฐ, MSK(Kafka)๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ๋ฉ”์‹œ์ง• ์ฒ˜๋ฆฌ
๊ตฌ์„ฑ์š”์†Œ
์„ค๊ณ„ ์˜๋„
RDS (PostgreSQL)
Multi-AZ ๊ตฌ์„ฑ (Writer: AZ-a, Standby: AZ-b, Reader ๋‹ค์ค‘ ๊ตฌ์„ฑ)
Redis (ElastiCache)
- Session Redis (์„ธ์…˜ + ๋น„์˜์† ๋ฐ์ดํ„ฐ ์ €์žฅ์†Œ) - Cache Redis (์บ์‹œ ์ €์žฅ์†Œ)
MSK (Kafka)
- ๊ฒฐ์ œ(Payment) ์„œ๋น„์Šค ์ ์šฉ - ๋กœ๊ทธ ํŒŒ์ดํ”„๋ผ์ธ ์ ์šฉ
S3
- ์ด๋ฏธ์ง€ ์ €์žฅ์†Œ ๋ฐ ์ •์  ์˜ค๋ธŒ์ ํŠธ ์ €์žฅ - VPC Endpoint ํ†ตํ•œ ์‚ฌ์„ค ์—ฐ๊ฒฐ ๊ตฌ์„ฑ

6. ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜ ์•„ํ‚คํ…์ฒ˜ - MSK

MSK(Kafka)๋Š” ๊ฒฐ์ œ ์Šน์ธ ๊ฒฐ๊ณผ๋ฅผ ๋น„๋กฏํ•œ ์ฃผ๋ฌธยท์ƒํ’ˆยท์ฟ ํฐ ์„œ๋น„์Šค ๊ฐ„ SAGA ๋ณด์ƒ ํŠธ๋žœ์žญ์…˜์„ ๊ตฌํ˜„ํ•˜๊ธฐ ์œ„ํ•œ ํ•ต์‹ฌ ๋ฉ”์‹œ์ง• ํ”Œ๋žซํผ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
๋ณธ ํ”„๋กœ์ ํŠธ์—์„œ๋Š” ๊ฒฐ์ œ ์„œ๋น„์Šค ์ค‘์‹ฌ์˜ ๋ถ€๋ถ„ ๋„์ž…์„ ์‹œ์ž‘์œผ๋กœ, ์ ์ง„์ ์œผ๋กœ EDA๋ฅผ ํ™•์žฅํ•œ๋‹ค.

6.1 ๋„์ž… ๋ฐฐ๊ฒฝ

๋ชฉ์ 
์„ค๋ช…
SAGA ๋ณด์ƒ ํŠธ๋žœ์žญ์…˜ ์ฒ˜๋ฆฌ
๊ฒฐ์ œ ์Šน์ธ ์‹คํŒจ ์‹œ ์ฃผ๋ฌธ/์žฌ๊ณ /์ฟ ํฐ ์›๋ณต ์ฒ˜๋ฆฌ
์„œ๋น„์Šค ๊ฐ„ ๊ฐ•๊ฒฐํ•ฉ ์ œ๊ฑฐ
๋™๊ธฐ HTTP ํ˜ธ์ถœ ์ œ๊ฑฐ โ†’ ๋น„๋™๊ธฐ ๋ฉ”์‹œ์ง€ ๊ธฐ๋ฐ˜ ํ†ต์‹ 
์žฅ์•  ์ „ํŒŒ ์ฐจ๋‹จ
์™ธ๋ถ€ ์„œ๋น„์Šค ์‹คํŒจ๊ฐ€ ์ฃผ๋ฌธ ์„œ๋น„์Šค๊นŒ์ง€ ์ „ํŒŒ๋˜์ง€ ์•Š๋„๋ก ๊ฒฉ๋ฆฌ

6.2 ์ด๋ฒคํŠธ ํ๋ฆ„

๊ฒฐ์ œ ์Šน์ธ ์„ฑ๊ณต ํ๋ฆ„

1.
payment-service โ†’ payments-paid ๋ฐœํ–‰
2.
order-service consume โ†’ ์ฃผ๋ฌธ ์ƒํƒœ PREPARING ๋กœ ๋ณ€๊ฒฝ

๊ฒฐ์ œ ์Šน์ธ ์‹คํŒจ(๋ณด์ƒ ํ๋ฆ„)

1.
payment-service โ†’ payments-failed ๋ฐœํ–‰
2.
order-service consume โ†’ ์ฃผ๋ฌธยท์ฟ ํฐ ์›๋ณต (idempotent)
3.
order-service โ†’ orders-failed ๋ฐœํ–‰
4.
product-service consume โ†’ ์žฌ๊ณ  ๋กค๋ฐฑ

6.3 Topic ์„ค๊ณ„

์ด๋ฒคํŠธ
Topic ๋ช…
๋ฐœํ–‰ ์ฃผ์ฒด
์†Œ๋น„ ์ฃผ์ฒด
๊ฒฐ์ œ ์Šน์ธ ์„ฑ๊ณต
payments-paid
payment-service
order-service
๊ฒฐ์ œ ์Šน์ธ ์‹คํŒจ
payments-failed
payment-service
order-service
์ฃผ๋ฌธ ๋ณด์ƒ ์‹คํŒจ
orders-failed
order-service
product-service

6.4 Kafka ํด๋Ÿฌ์Šคํ„ฐ ๊ตฌ์„ฑ

ํ•ญ๋ชฉ
๊ฐ’
์ด์œ 
Broker ๊ฐœ์ˆ˜
3๊ฐœ
Quorum(2) ํ™•๋ณด, ๊ณ ๊ฐ€์šฉ์„ฑ
Partition ์ˆ˜
3๊ฐœ
Broker ์ˆ˜์˜ ๋ฐฐ์ˆ˜ โ†’ Leader ๋ถ„์‚ฐ
Replica ์ˆ˜
3๊ฐœ
์žฅ์•  ์‹œ์—๋„ ๋ฉ”์‹œ์ง€ ์œ ์‹ค ๋ฐฉ์ง€
๋ณด๊ด€ ๊ธฐ๊ฐ„
7์ผ
๋น„์ •์ƒ ์ด๋ฒคํŠธ ์žฌ์ฒ˜๋ฆฌ ๋Œ€๋น„

6.5 Consumer Group ์ „๋žต

โ€ข
์„œ๋น„์Šค๋ณ„ 1 Consumer Group
โ—ฆ
์ฃผ๋ฌธ(Rebalancing) ๋ฌธ์ œ๋ฅผ ์ค„์ด๊ณ  ์•ˆ์ •์ ์ธ ๋ฉ”์‹œ์ง€ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด
order-service-group, product-service-group ํ˜•ํƒœ๋กœ ๊ตฌ์„ฑ
โ€ข
๊ฐ™์€ ๊ทธ๋ฃน ๋‚ด์—์„œ๋Š” ๋ฉ”์‹œ์ง€ ๋ณ‘๋ ฌ์„ฑ = Partition ์ˆ˜
โ—ฆ
ํ˜„์žฌ Partition=3 โ†’ 3๊ฐœ ์ธ์Šคํ„ด์Šค๊นŒ์ง€ ๋ณ‘๋ ฌ ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ
โ€ข
Exactly Once ๋ณด์žฅ
โ—ฆ
Consumer๋Š” ๋ฉ”์‹œ์ง€๋ฅผ ์ฒ˜๋ฆฌ ์„ฑ๊ณต ์‹œ ๋ช…์‹œ์ ์œผ๋กœ Commit
โ—ฆ
์‹คํŒจ ์‹œ Commitํ•˜์ง€ ์•Š๊ณ  DLQ๋กœ ์ด๋™ โ†’ ์žฌ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ

6.6 DLT (Dead Letter Topic)

Kafka์—์„œ Consumer๊ฐ€ ๋ฉ”์‹œ์ง€ ์ฒ˜๋ฆฌ์— ์‹คํŒจํ•  ๊ฒฝ์šฐ, ํ•ด๋‹น ๋ฉ”์‹œ์ง€๋Š” DLT(Dead Letter Topic)
์œผ๋กœ ์ด๋™ํ•˜๋ฉฐ ์žฌ์ฒ˜๋ฆฌ ๋˜๋Š” ์ˆ˜๋™ ๋ณต๊ตฌ๋ฅผ ์œ„ํ•œ ๊ทผ๊ฑฐ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉ๋œ๋‹ค.

DLT ์ ์šฉ ํฌ์ธํŠธ

์‹คํŒจ ์œ„์น˜
DLT ํ™œ์šฉ ์ด์œ 
order-service (payments-failed ์ฒ˜๋ฆฌ ์‹คํŒจ)
์ฃผ๋ฌธ/์ฟ ํฐ ์›๋ณต ์‹คํŒจ ์‹œ ๋ฐ˜๋“œ์‹œ ์žฌ์ฒ˜๋ฆฌ ํ•„์š”
product-service (orders-failed ์ฒ˜๋ฆฌ ์‹คํŒจ)
์žฌ๊ณ  ์›๋ณต ์‹คํŒจ ์‹œ ๋น„์ฆˆ๋‹ˆ์Šค ์žฅ์•  ์œ ๋ฐœ

DLT ๋ฉ”์‹œ์ง€ ์Šคํ‚ค๋งˆ

1) payments.failed ์ฒ˜๋ฆฌ ์‹คํŒจ ์ด๋ฒคํŠธ
ํ•„๋“œ
๋‚ด์šฉ
orderId
์ฃผ๋ฌธ ID
paymentKey
PG ๊ฒฐ์ œ ํ‚ค
errorSource
order-service or product-service
errorCode
๋‚ด๋ถ€ ์—๋Ÿฌ ์ฝ”๋“œ
errorMessage
์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€2) orders-failed ์ฒ˜๋ฆฌ ์‹คํŒจ
timestamp
์‹คํŒจ ๋ฐœ์ƒ ์‹œ๊ฐ
2) orders.failed ์ฒ˜๋ฆฌ ์‹คํŒจ ์ด๋ฒคํŠธ
ํ•„๋“œ
๋‚ด์šฉ
orderDetailId
์ฃผ๋ฌธ ์ƒ์„ธ ID
optionValueId
์˜ต์…˜ ID
quantity
์›๋ณตํ•ด์•ผ ํ•˜๋Š” ์ˆ˜๋Ÿ‰
paymentKey
PG ๊ฒฐ์ œ ํ‚ค
errorSource
order-service or product-service
errorCode
๋‚ด๋ถ€ ์—๋Ÿฌ ์ฝ”๋“œ
errorMessage
์˜ค๋ฅ˜ ๋ฉ”์‹œ์ง€

DLT ํ™œ์šฉ ์ „๋žต

1.
DLT Consumer๋ฅผ ํ†ตํ•œ ์ž๋™ ์žฌ์ฒ˜๋ฆฌ
โ€ข
์ผ์‹œ์  ์žฅ์• (์ž ๊น์˜ DB ๋ฝ, ๋„คํŠธ์›Œํฌ ์ง€์—ฐ ๋“ฑ)๋Š” ์žฌ์ฒ˜๋ฆฌ ์‹œ ์„ฑ๊ณต ๊ฐ€๋Šฅ
โ€ข
์ฃผ๋ฌธ/์žฌ๊ณ  ์‹œ์Šคํ…œ ๋ฐ์ดํ„ฐ ์ •ํ•ฉ์„ฑ์„ ์œ„ํ•ด ๋ฐ˜๋“œ์‹œ ํ•„์š”
2.
์šด์˜์ž ์ˆ˜๋™ ์žฌ์ฒ˜๋ฆฌ
โ€ข
๊ตฌ์กฐ์  ์žฅ์• (๋…ผ๋ฆฌ์  ์˜ค๋ฅ˜, ์Šคํ‚ค๋งˆ ๋ฌธ์ œ ๋“ฑ)๋Š” ์ˆ˜๋™ ์กฐ์น˜ ํ•„์š”
3.
DLT ๋ชจ๋‹ˆํ„ฐ๋ง
โ€ข
Grafana / CloudWatch์—์„œ DLT ์ ์žฌ๋Ÿ‰์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•ด ์šด์˜ ์žฅ์•  ์กฐ๊ธฐ ๊ฐ์ง€
โ€ข
ํŠน์ • Topic์˜ DLT ์ฆ๊ฐ€ = ํŠน์ • ์„œ๋น„์Šค ์žฅ์• ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Œ

6.7 ๋ฉฑ๋“ฑ์„ฑ(Idempotency) ์ „๋žต

์ค‘๋ณต ๋ฉ”์‹œ์ง€ ์ฒ˜๋ฆฌ ๋ฐ ์ค‘๋ณต ๊ฒฐ์ œ ์‹คํ–‰ ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ ๋ฉฑ๋“ฑ์„ฑ ์ฒ˜๋ฆฌ ์ „๋žต.

1) ๊ฒฐ์ œํ•˜๊ธฐ API ๋ฉฑ๋“ฑ์„ฑ โ€“ Toss Idempotency-Key

โ€ข
ํด๋ผ์ด์–ธํŠธ์—์„œ UUIDv4 ๊ธฐ๋ฐ˜ ๋ช…๋“ฑํ‚ค Idempotency-Key ์ƒ์„ฑ ๋ฐ ์ „๋‹ฌ
โ€ข
ํ† ์ŠคํŽ˜์ด๋จผ์ธ ๊ฐ€ ์„œ๋ฒ„ ๋‚ด๋ถ€์ ์œผ๋กœ ๋ฉฑ๋“ฑ ์ฒ˜๋ฆฌ
โ€ข
ALREADY_PROCESSED_PAYMENT ์‘๋‹ต ์‹œ โ†’ ๋ณด์ƒ ํŠธ๋žœ์žญ์…˜ ์‹คํ–‰ ๊ธˆ์ง€

2) Consumer ๋ฉฑ๋“ฑ์„ฑ

โ€ข
Kafka ๋ฉ”์‹œ์ง€๋Š” At-Least-Once๊ฐ€ ๊ธฐ๋ณธ์ด๋ฏ€๋กœ ๋™์ผ ๋ฉ”์‹œ์ง€๊ฐ€ ์ค‘๋ณต ์ „๋‹ฌ๋  ๊ฐ€๋Šฅ์„ฑ ์กด์žฌ
โ€ข
Consumer๋Š” DB ์ƒํƒœ ๊ธฐ๋ฐ˜ ๋ฉฑ๋“ฑ์„ฑ ์ฒดํฌ ํ•„์š”
์˜ˆ์‹œ (order-service):
if (order.status == FAILED) return if (order.status == PREPARING) return
Java
๋ณต์‚ฌ

6.8 Producer ๋ฐœํ–‰ ์‹คํŒจ ์ฒ˜๋ฆฌ

๋ฐฉ์•ˆ 1) Transactional Outbox Pattern

DB ์—…๋ฐ์ดํŠธ + Event ์ €์žฅ์„ ํ•˜๋‚˜์˜ ํŠธ๋žœ์žญ์…˜์œผ๋กœ ๋ฌถ์–ด ์›์ž์„ฑ ๋ณด์žฅ
์žฅ์ :
โ€ข
Kafka ์žฅ์• ์—๋„ ๋ฐ์ดํ„ฐ ์œ ์‹ค ์—†์Œ
โ€ข
์žฌ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ

๋ฐฉ์•ˆ 2) ๋ฐœํ–‰ ์‹คํŒจ ์žฌ์‹œ๋„ ํ…Œ์ด๋ธ” + ์Šค์ผ€์ค„๋Ÿฌ

โ€ข
publishStatus = FAILED ๋งˆํ‚น
โ€ข
์Šค์ผ€์ค„๋Ÿฌ๊ฐ€ 1๋ถ„ ๊ฐ„๊ฒฉ ์žฌ์‹œ๋„
โ€ข
Exponential Backoff ์ ์šฉ ๊ฐ€๋Šฅ

6.9 PENDING ์žฅ๊ธฐ ์ฒด๋ฅ˜ ์ฃผ๋ฌธ ์Šค์ผ€์ค„๋Ÿฌ

๊ฒฐ์ œ ์ง€์—ฐยท์™ธ๋ถ€ ์„œ๋น„์Šค ์˜ค๋ฅ˜๋กœ ์ธํ•ด 5๋ถ„ ์ด์ƒ PENDING ์ƒํƒœ๋กœ ๋‚จ์•„์žˆ๋Š” ์ฃผ๋ฌธ ์ •๋ฆฌ ๋กœ์ง

์‹คํ–‰ ์ „๋žต

โ€ข
์Šค์ผ€์ค„ ์ฃผ๊ธฐ: 1๋ถ„
โ€ข
ํƒ€์ž„์•„์›ƒ ๊ธฐ์ค€: 5๋ถ„

์ฒ˜๋ฆฌ ๋ฐฉ์‹

1.
5๋ถ„ ์ด์ƒ PENDING ์ฃผ๋ฌธ ์กฐํšŒ
2.
์ฃผ๋ฌธ ์ƒํƒœ FAILED๋กœ ๋ณ€๊ฒฝ
3.
์ฟ ํฐ / ์žฌ๊ณ  ๋กค๋ฐฑ ์š”์ฒญ ๋ฐœํ–‰
4.
DB ์—…๋ฐ์ดํŠธ ํ›„ Kafka์— ๋ณด์ƒ ์ด๋ฒคํŠธ ๋ฐœํ–‰

๊ฐœ์„  ๋ฐฉ์•ˆ

1.
์™ธ๋ถ€ ์‹œ์Šคํ…œ ํ˜ธ์ถœ์„ ๋น„๋™๊ธฐ ์ด๋ฒคํŠธ ๊ธฐ๋ฐ˜์œผ๋กœ ์ „ํ™˜ (์ตœ์  ๋ฐฉ์‹)
โ€ข
์ฃผ๋ฌธ ์„œ๋น„์Šค๋Š” DB ์—…๋ฐ์ดํŠธ + Kafka ์ด๋ฒคํŠธ ๋ฐœํ–‰์œผ๋กœ ์ž‘์—… ์ข…๋ฃŒ
โ€ข
product/payment ์„œ๋น„์Šค๊ฐ€ ํ† ํ”ฝ์„ ๊ตฌ๋…ํ•˜์—ฌ ์ž์ฒด์ ์œผ๋กœ ๋กค๋ฐฑ ์‹คํ–‰
โ€ข
์‘๋‹ต ๋Œ€๊ธฐ ์‹œ๊ฐ„์ด ์—†์œผ๋ฏ€๋กœ ์•ˆ์ •์„ฑ + ์„ฑ๋Šฅ ๊ฐœ์„ 
2.
์ฒญํฌ ์ฒ˜๋ฆฌ (๋Œ€๋Ÿ‰ ์ฃผ๋ฌธ ๋Œ€๋น„)
โ€ข
PENDING ์ฃผ๋ฌธ์„ ์ „์ฒด ์กฐํšŒํ•˜์ง€ ์•Š๊ณ  chunk ๋‹จ์œ„๋กœ ์ฒ˜๋ฆฌ
โ€ข
์‹œ์Šคํ…œ ๋ถ€ํ•˜ ๊ฐ์†Œ
โ€ข
์‹คํŒจ ์‹œ ํŠน์ • chunk๋งŒ ์žฌ์ฒ˜๋ฆฌ ๊ฐ€๋Šฅ

7. CI/CD ํŒŒ์ดํ”„๋ผ์ธ

GitOps ํŒŒ์ดํ”„๋ผ์ธ ๋‹ค์ด์–ด๊ทธ๋žจ

GitHub Actions (CI)

โ€ข
Lint โ†’ Build โ†’ Test โ†’ Docker Build โ†’ ECR Push
โ€ข
Helm values ์—…๋ฐ์ดํŠธ

ArgoCD (CD)

โ€ข
GitOps Repo ์ƒํƒœ๋ฅผ ๊ธฐ์ค€์œผ๋กœ EKS ์ƒํƒœ ์ž๋™ ๋™๊ธฐํ™”
โ€ข
Health/Sync ์ž๋™ ๊ด€๋ฆฌ
โ€ข
Helm-Based Release ๊ด€๋ฆฌ
โ€ข
๋ฐ์ดํ„ฐ ๋ณด์กด์„ ์œ„ํ•œ EBS Volume ๋˜๋Š” EFS Volume ๋งˆ์šดํŠธ ๊ณ ๋ ค
โ—ฆ
ArgoCD๋Š” GitOps ๊ธฐ๋ฐ˜์˜ stateless ์šด์˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋ฏ€๋กœ ๋ณ„๋„์˜ EBS/EFS ๋งˆ์šดํŠธ ๋ถˆํ•„์š”
โ—ฆ
ArgoCD๋Š” Kubernetes ๋ฆฌ์†Œ์Šค๋ฅผ Git ์ €์žฅ์†Œ๋ฅผ ํ†ตํ•ด ์žฌ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ, EKS ํด๋Ÿฌ์Šคํ„ฐ ์žฌ์ƒ์„ฑ ์‹œ์—๋„ Git ์ƒํƒœ๋ฅผ ๊ธฐ์ค€์œผ๋กœ ์ „์ฒด ์ƒํƒœ ๋ณต๊ตฌ ๊ฐ€๋Šฅ

8. ๋ฐฐํฌ ์ „๋žต - Canary

์šฐ์„ ์ ์œผ๋กœ๋Š” Istio VirtualService ๊ธฐ๋ฐ˜์˜ ์ˆ˜๋™ Canary ๋ฐฉ์‹์œผ๋กœ ์šด์˜

๋„์ž… ๋ฐฐ๊ฒฝ

โ€ข
Blue/Green ๋Œ€๋น„ ๋” ์•ˆ์ „ํ•œ ์ ์ง„์  ๋ฐฐํฌ ์ œ๊ณต
โ€ข
Istio VirtualService ๊ธฐ๋ฐ˜ ํŠธ๋ž˜ํ”ฝ ๋ถ„ํ•  ๊ฐ€๋Šฅ
โ€ข
์„œ๋น„์Šค ์•ˆ์ •์„ฑ ํ™•๋ณด๋ฅผ ์œ„ํ•œ ์ ์ง„์  ๊ฒ€์ฆ
โ€ข
GitOps ๊ธฐ๋ฐ˜ ๋ฐฐํฌ ์ด๋ ฅ ๋ณด์กด

8.1 ๋ฐฐํฌ ํ๋ฆ„

1.
์‹ ๊ทœ ๋ฒ„์ „ ๋ฐฐํฌ
2.
DestinationRule์„ ํ†ตํ•œ v1/v2 Subset ์ •์˜
3.
VirtualService ๊ธฐ๋ฐ˜ Canary ํŠธ๋ž˜ํ”ฝ ์„ค์ •
โ€ข
v1: 90%, v2: 10%
4.
Grafana + Tempo + Loki ๊ธฐ๋ฐ˜ ๋ชจ๋‹ˆํ„ฐ๋ง
โ€ข
Latency/Error, Log, Trace
5.
์˜ค๋ฅ˜ ์‹œ Rollback
โ€ข
v1 = 100%
6.
์ •์ƒ ๋™์ž‘ ์‹œ ํŠธ๋ž˜ํ”ฝ ํ™•๋Œ€
โ€ข
10 โ†’ 30 โ†’ 50 โ†’ 100

8.2 Canary ๋ฐฐํฌ ๋‹จ๊ณ„

๋‹จ๊ณ„
ํŠธ๋ž˜ํ”ฝ ๋น„์œจ
์ˆ˜ํ–‰ ํ•ญ๋ชฉ
์„ฑ๊ณต ๊ธฐ์ค€
Phase 0
0% โ†’ 10%
์‹ ๊ทœ ๋ฒ„์ „(v2) ๋ฐฐํฌ, VirtualService weight 10% ์ ์šฉ
Pod Ready 100%, ์˜ค๋ฅ˜ ์—†์Œ
Phase 1
10% โ†’ 30%
์ฃผ์š” API ์‘๋‹ต ๊ฒ€์ฆ
Latency ์ฆ๊ฐ€ < 200ms, Error Rate < 1%
Phase 2
30% โ†’ 50%
์‹ค์ œ ์‚ฌ์šฉ์ž ํŠธ๋ž˜ํ”ฝ ์ ์ง„์  ๋ฐ˜์˜
5xx < 0.3%, Business Error < 1%
Phase 3
50% โ†’ 100%
์ „์ฒด ํŠธ๋ž˜ํ”ฝ ์ „ํ™˜
Pod ์•ˆ์ •ํ™”, ๋กœ๊ทธ ์ด์ƒ ์—†์Œ

8.3 Canary ์‹คํŒจ ๊ธฐ์ค€ (Rollback ์กฐ๊ฑด)

Rollback์€ ์ฆ‰์‹œ VirtualService๋ฅผ Stable(v1)๋กœ ์ „ํ™˜ํ•˜์—ฌ ์ฒ˜๋ฆฌํ•œ๋‹ค.
๋ถ„๋ฅ˜
์กฐ๊ฑด
HTTP ์˜ค๋ฅ˜์œจ
5xx ๋น„์œจ 3% ์ด์ƒ (1๋ถ„ ํ‰๊ท )
Latency ์ฆ๊ฐ€
ํ‰๊ท  ์‘๋‹ต์†๋„ 200ms ์ด์ƒ ์ฆ๊ฐ€
์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ์˜ค๋ฅ˜์œจ
Business Error 1% ์ด์ƒ
Readiness ์‹คํŒจ
์‹ ๊ทœ Pod Ready ์‹คํŒจ
๋ฆฌ์†Œ์Šค ์ž„๊ณ„์น˜
CPU > 80%, Memory > 85%

8.4 ๊ตฌ์„ฑ ์š”์†Œ

์š”์†Œ
์—ญํ• 
Deployment (v1/v2)
๋ฒ„์ „๋ณ„ Pod
DestinationRule
Subset ์ •์˜ (labels ๊ธฐ๋ฐ˜)
VirtualService
ํŠธ๋ž˜ํ”ฝ ๋น„์œจ ์กฐ์ •
Envoy Sidecar
์‹ค์ œ ํŠธ๋ž˜ํ”ฝ ๋ถ„๋ฐฐ ์ˆ˜ํ–‰
ArgoCD Sync
GitOps ๊ธฐ๋ฐ˜ ์ž๋™ ๋ฐฐํฌ

8.5 ๊ณ ๋ ค์‚ฌํ•ญ (Manual Canary ์šด์˜ ๊ด€์ )

ํ•ญ๋ชฉ
์„ค๋ช…
Observability
Grafana / Loki / Tempo ํ†ตํ•ฉ ํ•„์š”
Sticky Session
์„ธ์…˜ ์ผ๊ด€์„ฑ ํ™•๋ณด๊ฐ€ ํ•„์š”ํ•œ API์— ๋Œ€ํ•œ Session Redis ํ˜ธํ™˜์„ฑ ๊ฒ€์ฆ ํ•„์š”
ํ…Œ์ŠคํŠธ ์ž๋™ํ™” ํ•„์š”์„ฑ
Canary ๋‹จ๊ณ„๋ณ„ ๊ฒ€์ฆ์„ ์œ„ํ•œ k6/JMeter ๋“ฑ์˜ API Test ๊ตฌ์„ฑ ๊ณ ๋ ค
Resource Impact
v1/v2 ๋™์‹œ ์šด์˜์œผ๋กœ ์ธํ•œ Pod ์ˆ˜ ์ฆ๊ฐ€ โ†’ HPA/Karpenter
ํ–ฅํ›„ ํ™•์žฅ์„ฑ
ํ˜„์žฌ๋Š” ํ”„๋กœ์ ํŠธ ๊ธฐ๊ฐ„ ๊ณ ๋ คํ•˜์—ฌ ์ˆ˜๋™ Canary ์šด์˜ ๋ฐฉ์‹ ์ ์šฉ ํ–ฅ์ˆ˜ Auto Rollback์„ ๋„์ž…ํ•  ๊ฒฝ์šฐ Argo Rollouts + AnalysisTemplate ๊ธฐ๋ฐ˜ ํ™•์žฅ

9. ์˜คํ† ์Šค์ผ€์ผ๋ง ์ „๋žต

9.1 Pod Autoscaling (HPA)

9.1.1 HPA ์ •์ฑ… ๊ธฐ์ค€

โ€ข
CPU ์‚ฌ์šฉ๋ฅ  > 70%
โ€ข
Memory ์‚ฌ์šฉ๋ฅ  > 75%
โ€ข
Target Metric ๊ธฐ๋ฐ˜ ์ž๋™ ํ™•์žฅ
โ€ข
Canary ๋ฐฐํฌ์‹œ v1 + v2 Pod ๋™์‹œ ์กด์žฌ ๊ณ ๋ ค
โ€ข
Karpenter Node ์ƒ์„ฑ ์‹œ๊ฐ„์„ ๊ณ ๋ คํ•œ ์ตœ์†Œ Replica ๋ฒ„ํผ ์ „๋žต ์ ์šฉ

9.1.2 ์„œ๋น„์Šค๋ณ„ HPA minReplicas ๋ฒ„ํผ ์ „๋žต

Karpenter์˜ Node Provisioning ์‹œ๊ฐ„๋™์•ˆ ๊ธฐ์กด Pod๊ฐ€ ๊ณผ๋ถ€ํ•˜๋˜์ง€ ์•Š๋„๋ก ์ตœ์†Œ Replica๋ฅผ ์—ฌ์œ ์žˆ๊ฒŒ ์œ ์ง€ํ•œ๋‹ค.
์„œ๋น„์Šค ์œ ํ˜•
ํŠธ๋ž˜ํ”ฝ ํŠน์„ฑ
์ดˆ๊ธฐ ๊ตฌ์„ฑ minReplica ๋ฒ„ํผ
Gateway (Ingress)
์™ธ๋ถ€ ์š”์ฒญ ์ง‘์ค‘, Latency ๋ฏผ๊ฐ
ํ‰์‹œ Pod์˜ 150%
Order / Payment
Latency ๋ฏผ๊ฐ, ์žฅ์•  ์˜ํ–ฅ ํผ
130~150%
Product
๋†’์€ ์กฐํšŒ ๋นˆ๋„
120~130%
User
์ผ๋ฐ˜ API
120%
AI
์ค‘์š”๋„ ๋‚ฎ์Œ, ์‚ฌ์šฉ๋Ÿ‰ ์ ์Œ
100~110%

9.1.3 ์„œ๋น„์Šค๋ณ„ HPA ์„ค์ •

์„œ๋น„์Šค
Min~Max
ํŠน์ง•
Gateway (Ingress)
3~6
์š”์ฒญ ์ง‘์ค‘ ๊ตฌ๊ฐ„, ์•ˆ์ •์„ฑ ์ค‘์š”
User/Product
2~10
์กฐํšŒ ์ค‘์‹ฌ, ํŠธ๋ž˜ํ”ฝ ๋ณ€๋™ํญ ํผ
Order/Payment
2~5
์ฃผ๋ฌธ/๊ฒฐ์ œ ์ค‘์‹ฌ, ์•ˆ์ •์„ฑ ์ค‘์š”
AI
1~2
์ตœ์†Œ ์œ ์ง€

9.1.4 Requests / Limits

โ€ข
์ž„์‹œ - GPT ์ถ”์ฒœ๊ฐ’ / ์ถ”ํ›„ ๋ชจ๋‹ˆํ„ฐ๋ง ๋ฐ˜์˜ ์กฐ์ •
์„œ๋น„์Šค
CPU Request
CPU Limit
Memory Request
Memory Limit
Gateway
250m
500m
512Mi
1Gi
User/Product
200m
400m
256Mi
512Mi
Order
200m
500m
512Mi
1Gi
Payment
300m
600m
512Mi
1Gi
AI
100m
300m
256Mi
512Mi

9.2 NodeAutoscaling (Karpenter)

โ€ข
Karpenter๋ฅผ ๋„์ž…ํ•˜์—ฌ ์ž๋™ Node Provisioning์„ ๋ชฉํ‘œ๋กœ ํ•จ

9.2.1 Karpenter ๋„์ž… ๋ฐฐ๊ฒฝ

๋ฌธ์ œ
์„ค๋ช…
ECS / ๊ธฐ์กด EKS์˜ ํ•œ๊ณ„
Pod ์ฆ๊ฐ€ ์‹œ NodeGroup์˜ scale-out ์†๋„๊ฐ€ ๋А๋ฆผ
๊ณ ์ • ๋…ธ๋“œ ๋น„์šฉ ๋ฐœ์ƒ
ํ•ญ์ƒ Node๋ฅผ ์œ ์ง€ํ•ด์•ผ ํ•ด์„œ ๋น„์šฉ ์ฆ๊ฐ€
Canary ๋ฐฐํฌ ์‹œ Pod ์ผ์‹œ ์ฆ๊ฐ€
Canary ์‹คํ–‰ ์‹œ v1+v2 Pod ๋™์‹œ ์šด์˜ โ†’ ๋…ธ๋“œ ๋ถ€์กฑ ๋ฐœ์ƒ
ํŠธ๋ž˜ํ”ฝ ๋ณ€๋™์„ฑ
Product, Order ์„œ๋น„์Šค์˜ ํŠธ๋ž˜ํ”ฝ ๋ณ€๋™ ํญ์ด ํผ

9.2.2 Karpenter ๋™์ž‘ ํ๋ฆ„

HPA๊ฐ€ Pod ์ฆ๊ฐ€ โ†’ ๊ธฐ์กด Node capacity ๋ถ€์กฑ โ†’ Pod Pending ๋ฐœ์ƒ โ†’ Node Provisioning (Karpenter) โ†’ Pod Scheduling ์™„๋ฃŒ โ†’ ํŠธ๋ž˜ํ”ฝ ๊ฐ์†Œ โ†’ Idle Node ๋ฐœ์ƒ โ†’ Idle Node ์ž๋™ ์ œ๊ฑฐ (Consolidation)
Plain Text
๋ณต์‚ฌ

9.2.3 Autoscaling ๊ณ ๋ ค์‚ฌํ•ญ

โ€ข
Karpenter Provisioning ์‹œ๊ฐ„๋™์•ˆ ๊ธฐ์กด Pod์˜ ๊ณผ๋ถ€ํ™” ๋ฐฉ์ง€๋ฅผ ์œ„ํ•œ minReplicas ๋ฒ„ํผ ์ ์šฉ
โ€ข
Canary ๋ฐฐํฌ ์‹œ v1/v2 Replica ๋™์‹œ ์ฆ๊ฐ€ โ†’ Karpenter ์ฆ‰๊ฐ ํ™•์žฅ ํ•„์ˆ˜
โ€ข
ํŠธ๋ž˜ํ”ฝ ๊ฐ์†Œ ์‹œ Consolidation์„ ํ†ตํ•ด ๋น„์šฉ ์ตœ์†Œํ™”
โ€ข
Pod ์š”์ฒญ(Request)์ด ๋‚ฎ๊ฒŒ ์žกํžˆ๋ฉด ๊ณผ๋„ํ•œ ํ™•์žฅ ๋ฐœ์ƒํ•˜๋ฏ€๋กœ ์ ์ ˆํ•œ Request ํ•„์š”
โ€ข
Spot Instance ์‚ฌ์šฉ ๊ณ ๋ ค

10. Observability

10.1 ๊ตฌ์„ฑ์š”์†Œ

๊ณ„์ธต
๊ตฌ์„ฑ ์š”์†Œ
์—ญํ• 
Metrics
Prometheus
์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ยทIstioยทNode ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘
Logs
Fluentbit โ†’ Loki
stdout ๋กœ๊ทธ ์ˆ˜์ง‘ ๋ฐ ์ €์žฅ
Tracing
Envoy / Spring Boot OTel โ†’ OTel Collector โ†’ Tempo
๋ถ„์‚ฐ ํŠธ๋ ˆ์ด์‹ฑ ์ €์žฅ
Dashboard
Grafana
Metrics / Logs / Tracing ์‹œ๊ฐํ™”
Infra Logs
CloudWatch Logs
Node, Control Plane, EKS Infra ๋กœ๊ทธ ์ €์žฅ

10.2 Metrics ์ˆ˜์ง‘ ํ๋ฆ„

10.2.1 ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ / ์„œ๋น„์Šค ๋ฉ”ํŠธ๋ฆญ

โ€ข
Spring Boot Actuator์˜ /actuator/prometheus ํ™œ์šฉ
โ€ข
Prometheus Operator๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฉ”ํŠธ๋ฆญ ์ˆ˜์ง‘
โ€ข
์„œ๋น„์Šค๋ณ„ ServiceMonitor๋ฅผ ํ†ตํ•ด Scrape
๋ฐ์ดํ„ฐ ํ๋ฆ„:
Application Pod (/actuator/prometheus) โ†’ ServiceMonitor โ†’ Prometheus โ†’ Grafana
Plain Text
๋ณต์‚ฌ

10.2.2 Istio (Envoy) ๋ฉ”ํŠธ๋ฆญ

โ€ข
Istio Telemetry v2 ํ™œ์„ฑํ™”
โ€ข
Prometheus๊ฐ€ Istio์˜ Envoy Proxy๋ฅผ Scrape
โ€ข
ServiceMonitor ํ™œ์šฉ
Envoy SideCar โ†’ Prometheus (ServiceMonitor) โ†’ Grafana (Istio Dashboard)
Plain Text
๋ณต์‚ฌ

10.2.3 Node / ์‹œ์Šคํ…œ ๋ฉ”ํŠธ๋ฆญ

โ€ข
๊ตฌ์„ฑ์š”์†Œ
โ—ฆ
kube-state-metrics
โ—ฆ
node-exporter (PodMonitor ํ™œ์šฉ)
โ€ข
์ˆ˜์ง‘ ์ง€ํ‘œ
โ—ฆ
CPU / Memory
โ—ฆ
Pod ์ƒํƒœ
โ—ฆ
Node ์ƒํƒœ / Disk I/O
โ—ฆ
๋„คํŠธ์›Œํฌ ๋“ฑ ์ธํ”„๋ผ ์ž์› ์ƒํƒœ ๋ชจ๋‹ˆํ„ฐ๋ง

10.3 Logs ์ˆ˜์ง‘ ํ๋ฆ„

๊ตฌ์„ฑ ์š”์†Œ

โ€ข
FluentBit DaemonSet
โ€ข
Loki
โ€ข
S3

๋กœ๊ทธ ํ๋ฆ„

Container stdout/stderr โ†’ Fluentbit โ†’ Loki โ†’ S3 โ†’ Grafana
Plain Text
๋ณต์‚ฌ

Fluentbit ์ฃผ์š” ํ•„ํ„ฐ

โ€ข
JSON Log Parser
โ€ข
Level ํ•„ํ„ฐ

10.4 Tracing ์ˆ˜์ง‘ ํ๋ฆ„

Tracing์€ Envoy(๋„คํŠธ์›Œํฌ ๋ ˆ๋ฒจ) + Application(๋น„์ฆˆ๋‹ˆ์Šค ๋ ˆ๋ฒจ) ๋ชจ๋‘ ์ˆ˜์ง‘ํ•˜๋ฉฐ, OTel Collector๋ฅผ ๊ฑฐ์ณ Tempo์— ์ €์žฅ

10.4.1 Envoy โ†’ Tempo(Zipkin)

1.
Envoy๊ฐ€ Zipkin ํฌ๋งท์œผ๋กœ Trace ์ƒ์„ฑ
โ€ข
Istio Proxy(Envoy)๋Š” Zipkin ํฌ๋งท Trace๋ฅผ ๊ธฐ๋ณธ ์ง€์›
โ€ข
Ingress Gateway, Sidecar ๋ชจ๋‘ Span ์ƒ์„ฑ
2.
OTel Collector๊ฐ€ Zipkin ํฌ๋งท ์ˆ˜์‹ 
โ€ข
Collector์˜ Zipkin Receiver๊ฐ€ Envoy์—์„œ ๋ณด๋‚ธ ๋ฐ์ดํ„ฐ๋ฅผ ์ˆ˜์‹ 
โ€ข
๋‚ด๋ถ€์ ์œผ๋กœ OTLP๋กœ ๋ณ€ํ™˜ โ†’ Tempo๋กœ Export
3.
Collector์—์„œ Tempo๋กœ Export
๋ฐ์ดํ„ฐ ํ๋ฆ„
Envoy (Zipkin) โ†’ OTel Collector (zipkin receiver) โ†’ OTel Collector (otlp exporter) โ†’ Tempo โ†’ Grafana
Plain Text
๋ณต์‚ฌ

10.4.2 Application โ†’ Tempo (OTLP)

Java ์„œ๋น„์Šค๋ฅผ OpenTelemetry Java Agent๋กœ ๊ตฌ์„ฑํ•˜์—ฌ OTLP/gRPC๋กœ Trace ์ „์†ก
Application (OTLP โ†’ grpc) โ†’ OTel Collector (otlp receiver) โ†’ Tempo (otlp exporter) โ†’ Grafana
Plain Text
๋ณต์‚ฌ

10.4.3 End-to-End ํŠธ๋ ˆ์ด์‹ฑ

Envoy Span๊ณผ Application Span์€ ๋™์ผ Trace ID๋ฅผ ๊ณต์œ ํ•˜์—ฌ ํ•˜๋‚˜์˜ ์ „์ฒด ํŠธ๋žœ์žญ์…˜ ํ๋ฆ„์œผ๋กœ ๊ฒฐํ•ฉ๋œ๋‹ค.

11. ๊ณ ๊ฐ€์šฉ์„ฑ & ์žฅ์•  ๋ณต๊ตฌ ์ „๋žต

โ€ข
EKS Multi-AZ NodeGroup ์šด์˜
โ€ข
RDS Multi-AZ Failover
โ€ข
Redis Multi-AZ
โ€ข
ArgoCD Rollback
โ€ข
S3 Versioning
โ€ข
Terraform ๊ธฐ๋ฐ˜ ์ธํ”„๋ผ re-provision

12. ๋น„์šฉ ๊ด€๋ฆฌ ์ „๋žต

ํ•ญ๋ชฉ
์ „๋žต
EKS
NodeGroup ์˜จ๋””๋งจ๋“œ ์ตœ์†Œ ์œ ์ง€ + Karpenter ์ž๋™ ํ™•์žฅ
Node
Consolidation์œผ๋กœ Idle Node ์ตœ์†Œํ™”
NAT Gateway
Dev/Stage ์ตœ์†Œ ๊ตฌ์„ฑ
CloudWatch Logs
15์ผ ๋ณด๊ด€
Redis
ClusterMode Off
MSK
๊ฒฐ์ œ ์„œ๋น„์Šค ์ค‘์‹ฌ ์ตœ์†Œ ๊ตฌ์„ฑ
ECR
Private ECR ์‚ฌ์šฉ