μμ±μΌ: 2025.11.18
κ΄λ ¨ μλΉμ€: Jenkins CI / Docker Build Pipeline
μ₯μ μ ν: EBS I/O μ±λ₯ κ³ κ°λ‘ μΈν Jenkins Hang μν
Β κ΄λ ¨ μ₯μ λ³΄κ³ μ:
[TroubleShooting] Jenkins μ₯μ λΆμ
1. κ°μ
λ³Έ λ¬Έμλ 2025.11.17 23:50 ~ 00:10 λ°μν Jenkins μ€λ¨ μ₯μ μ λν΄ μ₯μ λ°μ μ΄ν μνλ μ‘°μΉμ μ±λ₯ κ°μ κ²°κ³Όλ₯Ό κΈ°λ‘ν λ³΄κ³ μμ΄λ€.
μ₯μ μ κ·Όλ³Έ μμΈμ Jenkins EC2 μΈμ€ν΄μ€κ° μ¬μ©νλ EBS(gp2)μ
BurstBalance κ³ κ° β I/O Stall β OS Hang
μ΄μμΌλ©°, μ΄μ λ°λΌ Jenkins UI, SSH, Docker λͺ¨λ μλ΅νμ§ λͺ»νλ μνκ° λ°μνλ€.
μ΄ λ¬Έμλ μ₯μ λΆμ μμ½ β μνν μ‘°μΉ β μ±λ₯ λΉκ΅ μμλ‘ κ΅¬μ±νλ€.
2. μ₯μ μμΈ μμ½
Β Root Cause
β’
Jenkins EC2μ Docker Root(/var/lib/docker) κ° EC2μ μ£Ό EBS λ³Όλ₯¨(gp2) μ μμΉ
β’
Docker build & image layer μμ
μ€ ReadOps/WriteOps νμ¦
β’
gp2 νΉμ±μ BurstCredit κ³ κ° β BurstBalance 0%
β’
VolumeTotalReadTime/WriteTime μ΅λ 119μ΄, QueueLength κΈμ¦
β λμ€ν¬ μλ΅μ΄ μ€μ§ μμ OS λ 벨μμ block
β’
κ²°κ³Όμ μΌλ‘:
β¦
Jenkins UI: 504 Gateway Timeout
β¦
SSH: μ μ λΆκ°
β¦
Docker: build μ€λ¨
β¦
EC2 Status Check: μ μ (νλμ¨μ΄/λ€νΈμν¬ λ¬Έμ λ μλ)
μ£Ό μμΈ: μ€ν λ¦¬μ§ I/O λ³λͺ©(EBS μ±λ₯ λΆμ‘±)
BurstBalance
VolumeTotalReadTime
VolumeQueueLength
3. κΈ°μ‘΄ λ³Όλ₯¨ κ΅¬μ± vs λ³κ²½ ν λ³Όλ₯¨ ꡬμ±
κΈ°μ‘΄ ꡬ쑰λ OS(EBS 20GB) μμ Dockerκ° κ³΅μ‘΄νμ¬ Docker build μμ
μ΄ μ 체 νμΌ μμ€ν
μ λ³λͺ© νμμ κ°μ Έμ€λ ꡬ쑰μλ€. μ΄λ₯Ό λ€μκ³Ό κ°μ΄ κ°μ νμλ€.
Before
β’
Root EBS (gp2, 20GB)
β¦
OS
β¦
/var/lib/docker (Docker μ΄λ―Έμ§/λ μ΄μ΄) β μ₯μ μ§μ
β’
Jenkins Data EBS (gp2, 20GB)
β¦
/var/jenkins_home (Job μ€μ , νλ¬κ·ΈμΈ, λΉλ νμ€ν 리 λ± Jenkins state)
β¦
Jenkins λ°μ΄ν° λ³΄μ‘΄μ© λ³Όλ₯¨
After
β’
Root EBS (gp2, 20GB)
β’
Jenkins Data EBS (gp3, 50GB)
β¦
/var/jenkins_home (λ³κ²½ μμ)
β’
μ Docker μ μ© EBS (gp3, 50GB, 6000 IOPS, Throughput 125MB/s)
β¦
/var/lib/docker μ μ©
4. μ₯μ λμ λ° μ‘°μΉ λ΄μ
μ₯μ λ°μ μ§ν Jenkinsμ I/O λ³λͺ©μ ν΄κ²°νκΈ° μν΄ λ€μκ³Ό κ°μ μ‘°μΉλ₯Ό μννμλ€.
4.1 Docker μ μ© EBS(gp3) μΆκ° λ° Attach
1) Terraform: Docker μ μ© EBS μμ±
resource "aws_ebs_volume" "docker" {
count = length(local.docker_existing_ids) > 0 ? 0 : 1
availability_zone = var.az
size = var.docker_ebs_size # 50
type = var.docker_ebs_type # gp3
iops = var.docker_ebs_iops # 6000
throughput = var.docker_ebs_throughput # 125
tags = {
Name = "${var.prefix}-docker-data"
}
lifecycle {
prevent_destroy = true
}
}
HCL
볡μ¬
2) Terraform: EC2μ Docker λ³Όλ₯¨ attach
resource "aws_volume_attachment" "docker" {
device_name = "/dev/sdg"
volume_id = local.docker_volume_id
instance_id = aws_instance.this.id
force_detach = true
lifecycle {
ignore_changes = [volume_id]
}
}
HCL
볡μ¬
4.2 Userdataλ₯Ό ν΅ν /var/lib/docker λ§μ΄κ·Έλ μ΄μ
λΆν
μ μλμΌλ‘
1.
Docker μ μ© EBSλ₯Ό EXT4λ‘ ν¬λ§·
2.
/var/lib/dockerμ λ§μ΄νΈ
3.
/etc/fstabμ λ±λ‘
4.
Docker μλΉμ€ μ¬μμ
1) Userdata (λΆλΆ λ°μ·)
DOCKER_VOL_ID="${docker_volume_id}"
DOCKER_MNT="/var/lib/docker"
DOCKER_DEVICE=""
for i in $(seq 1 120); do
if DOCKER_DEVICE=$(find_nvme_device "$DOCKER_VOL_ID"); then
break
fi
echo "Retry $i/120: Docker volume not found..."
sleep 2
done
# 1) ν¬λ§·
if ! blkid "$DOCKER_DEVICE" >/dev/null 2>&1; then
mkfs.ext4 "$DOCKER_DEVICE"
fi
# 2) docker μ€μ§
systemctl stop docker || true
# 3) λ§μ΄νΈ λ° fstab λ±λ‘
mkdir -p "$DOCKER_MNT"
if ! mount | grep -q "$DOCKER_MNT"; then
mount "$DOCKER_DEVICE" "$DOCKER_MNT"
echo "$DOCKER_DEVICE $DOCKER_MNT ext4 defaults,nofail 0 2" >> /etc/fstab
fi
# 4) docker μ¬μμ
systemctl start docker
Shell
볡μ¬
μ΄λ‘μ¨ Docker I/Oλ μ λΆ Docker EBS(gp3)λ‘λ§ κ°κ³ , Root λμ€ν¬ λ° Jenkins Data λμ€ν¬μλ κ±°μ μν₯μ΄ κ°μ§ μκ² λλ€.
4.3 Jenkins Data λ³Όλ₯¨ μ μ§
β’
κΈ°μ‘΄μ Jenkins Data λ³Όλ₯¨(/mnt/jenkins_data β 컨ν
μ΄λ λ΄λΆ /var/jenkins_home)μ
μ₯μ μ΄μ λΆν° μ‘΄μ¬νλ λ³Όλ₯¨μ΄λ©°, μ΄λ² μμ
μμλ ν¬λ§·/μμ /λ³κ²½ μμ΄ κ·Έλλ‘ μ μ§νμλ€.
β’
Userdataμμλ Jenkins Volumeλ λ§μ°¬κ°μ§λ‘ NVMe λ§€ν ν /mnt/jenkins_dataμ λ§μ΄νΈλ§ μννλ€.
λ°λΌμ Jenkins Job μ€μ , νλ¬κ·ΈμΈ, λΉλ νμ€ν 리 λ±μ κ·Έλλ‘ μ μ§λκ³ , μ΄λ² μ‘°μΉλ‘ μΈν΄ Jenkins λ°μ΄ν° μμ€μ λ°μνμ§ μμλ€.
5. μ±λ₯ λ° μμ μ± κ°μ κ²°κ³Ό
5.1 EBS μ§ν
μ§ν | μ₯μ λΉμ (gp2 + Docker 곡μ ) | μ‘°μΉ ν (Docker μ μ© gp3) | κ°μ |
VolumeTotalReadTime | μ΅λ 119μ΄ | νκ· 5~9ms μμ€ | μ½ 13,000λ°° κ°μ |
VolumeTotalWriteTime | μμ μ΄ | λͺ ms μμ€ | μ μ λ²μ |
VolumeQueueLength | 10~30 | 0 ~ 0.2 | λκΈ°μ΄ ν΄μ |
BurstBalance | 0% | gp3λ ν¬λ λ§ κ°λ
μμ | ꡬ쑰μ ν΄κ²° |
ReadOps/WriteOps | κΈκ²©ν μ€νμ΄ν¬ + κΈ΄ Tail | μ§§μ μ€νμ΄ν¬ ν λ°λ‘ 0 | I/O μ 체 ν΄μ |
VolumeReadBytes / WriteBytes
VolumeReadOps / WriteOps
VolumeTotalReadTime / WriteTime
VolumeQueueLength
5.2 Jenkins Pipeline μ±λ₯ κ°μ
νλͺ© | λ³κ²½ μ | λ³κ²½ ν |
Docker Build & Push λ¨κ³ | μ₯μ λ°μμΌλ‘ μΈ‘μ λΆκ° | μ½ 1λΆ 2μ΄ |
μ 체 Pipeline μ€ν μκ° | μ₯μ λ°μμΌλ‘ μΈ‘μ λΆκ° | 1λΆ 35μ΄ λ΄μΈ μμ μ μΌλ‘ μλ£ |
Jenkins UI λ°μ | λΉλ μ€ λ§€μ° λλ¦Ό / νμμμ | λΉλ μ€μλ UI μ¦μ μλ΅ |
Before
After
6. κ²°λ‘
β’
Jenkins μ₯μ μ μμΈμ EC2 μΈμ€ν΄μ€λ Jenkins Data Volumeμ΄ μλλΌ,
Dockerκ° κ³΅μ Root EBS(gp2)μ λͺ°λ¦° ꡬ쑰μλ€.
β’
μ΄λ―Έ μ‘΄μ¬νλ Jenkins Data μ μ© λ³Όλ₯¨μ κ·Έλλ‘ μ μ§νλ©΄μ,
β’
Docker μ μ© gp3 EBSλ₯Ό μΆκ°νκ³ /var/lib/dockerλ₯Ό λΆλ¦¬ν¨μΌλ‘μ¨,
β¦
I/O λ³λͺ©μ΄ μμ ν μ κ±°λμκ³
β¦
Jenkins Pipelineμ μμ μ±κ³Ό μ±λ₯μ΄ λͺ¨λ ν¬κ² ν₯μλμλ€.
μ΄λ² μ‘°μΉλ
βλΉλ νΈλν½μ Docker λ³Όλ₯¨μΌλ‘, Jenkins μνλ Data λ³Όλ₯¨μΌλ‘, OSλ Root λ³Όλ₯¨μΌλ‘β
λΌλ λͺ
νν μν λΆλ¦¬λ₯Ό ν΅ν΄, μ€ν λ¦¬μ§ λ 벨μμ Jenkins CIμ μμ μ±μ λμΈ μμ
μΌλ‘ νκ°ν μ μλ€.













