Description
Hello,
I'm using ALB controller v2.2.0 and Ingress with instance target type.
I want to know how to rollingupdate with zero downtime.
I investigated some tests.
1. without preStop hook and terminationGracePeriodSeconds(default 30s)
- some 502 errors when rolling update
2. preStop hook sleep: 40s, terminationGracePeriodSeconds: 70s
- 502 erros are very rarely found
3. preStop hook sleep: 70s, terminationGracePeriodSeconds: 100s
- 502 errors are not found (yet)
According to the comments, is it not proper to set 40s? #1719 (comment), #1719 (comment)
(controller process time + ELB API propagation time + HTTP req/resp RTT + kubeproxy's iptable update time)
How can I find right value of sleep time?
Long sleep time means that the more containers can be run simultaneously, I think.
And, Is the sleep time not related with deregistration delay of target group?
(the target group used in above tests is set to 300s deregistration delay, but just 70 seconds sleep is enough to remove 502 errors)
Expected outcome
no downtime deployment without 502 errors.
Environment
- AWS Load Balancer controller version: v2.2.0
- Kubernetes version: 1.19
- Using EKS (yes/no), if so version?: 1.19
Additional Context:
test script(macos) :
#!/bin/bash
count_ok=0
count_not_ok=0
for (( ; ; ))
do
beginTime=`gdate +%s%3N`
status_code=$(curl --write-out %{http_code} --silent --output /dev/null {test_url})
endTime=`gdate +%s%3N`
elapsed=`echo "($endTime - $beginTime)" | bc`
echo StatusCode: $status_code, RP: $elapsed msec
if [ $status_code -eq 200 ]; then
((count_ok=count_ok+1))
else
((count_not_ok=count_not_ok+1))
fi
echo 200: $count_ok, not 200: $count_not_ok
done