Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can we find right value of sleep time for zero downtime rolling update? #2106

Closed
sechunOH opened this issue Jun 28, 2021 · 6 comments
Closed
Assignees

Comments

@sechunOH
Copy link

sechunOH commented Jun 28, 2021

Hello,
I'm using ALB controller v2.2.0 and Ingress with instance target type.

I want to know how to rollingupdate with zero downtime.

I investigated some tests.

1. without preStop hook and terminationGracePeriodSeconds(default 30s)
- some 502 errors when rolling update

2. preStop hook sleep: 40s, terminationGracePeriodSeconds: 70s
- 502 erros are very rarely found

3. preStop hook sleep: 70s, terminationGracePeriodSeconds: 100s
- 502 errors are not found (yet)

According to the comments, is it not proper to set 40s? #1719 (comment), #1719 (comment)
(controller process time + ELB API propagation time + HTTP req/resp RTT + kubeproxy's iptable update time)

How can I find right value of sleep time?
Long sleep time means that the more containers can be run simultaneously, I think.

And, Is the sleep time not related with deregistration delay of target group?
(the target group used in above tests is set to 300s deregistration delay, but just 70 seconds sleep is enough to remove 502 errors)

Expected outcome

no downtime deployment without 502 errors.

Environment

  • AWS Load Balancer controller version: v2.2.0
  • Kubernetes version: 1.19
  • Using EKS (yes/no), if so version?: 1.19

Additional Context:

test script(macos) :

#!/bin/bash

count_ok=0
count_not_ok=0

for (( ; ; ))
do
  beginTime=`gdate +%s%3N`
  status_code=$(curl --write-out %{http_code} --silent --output /dev/null {test_url})
  endTime=`gdate +%s%3N`
  elapsed=`echo "($endTime - $beginTime)" | bc`
  echo StatusCode: $status_code, RP: $elapsed msec
  if [ $status_code -eq 200 ]; then
    ((count_ok=count_ok+1))
  else
    ((count_not_ok=count_not_ok+1))
  fi
  echo 200: $count_ok, not 200: $count_not_ok
done
@sechunOH sechunOH changed the title Got 502 error even when prestop hook applied How can we find right value of sleep time for zero downtime rolling update? Jun 28, 2021
@M00nF1sh
Copy link
Collaborator

M00nF1sh commented Jul 7, 2021

@sechunOH

controller process time: it depends on the size/load of your cluster, you should be able to get that from the controller's metrics.
ELB API propagation time: we checked with ELB team, they don't have P99 for that. but it should be less than 60 second for ALB.
HTTP req/resp RTT: depends on your application
kubeproxy's iptable update time: ranges between 10-30 second. so it's take it capped to be 30 second.

So it's not proper to set as 40 second, and we currently don't offer an optimal setting as well as there is a lot variants above. You should tune it according to your application and cluster usage.

@sechunOH
Copy link
Author

sechunOH commented Jul 9, 2021

@M00nF1sh
Thank you for your details.
Which metric should I monit for controller process time?
Just keep looking the alb controller logs?

< additional questions >
I tested some cases, I cannot get anything about that.

In case of rolling update, pod status will be changed like below.

Running -> Terminating -> Terminated

More detail between "Terminating" -> "Terminated"

Terminating-> (preStop hook time) -> sent sigterm -> (sent sigkill if not terminated) -> Terminated

I set the preStop hook timeout with 150s(over 2minutes), but health check request continues until the nodejs application terminated with sigterm.

Is it implemented that ALB deregister "terminating" status Pod in case using "instance target type"?
If not, how can I redeploy applications with zero downtime using preStop hook?

I think there is no way for applications to aware that if they are in preStop hook, so they cannot restrict health check requests from ALB. Finally, the applications are terminated with no response for some requests. (If there is no graceful shutdown logic in applications)

Couldn't I redeploy with zerodowntime using just k8s and ALB?
(Of course, I can implement sigterm(in my case, sigint from PM2) handling logic, set keepalive timeout more than idle connection timeout of ALB and use kill timeout of PM2 for graceful shutdown)

What is the expected behaviour of k8s and ALB controller in preStop duration?
(I think pod readinessGates fit to IP target type, right?)

The most important thing is that when ALB does not check the health for "terminating" pod any more.

@sechunOH
Copy link
Author

@M00nF1sh
I tested it more, then I realized there must be some seconds for ALB to deregister "terminating" pods.

Above comment, I was confused why alb checks health of terminating pod.
After some tests, I realized that was not ALB's health check, but readiness probe check.
(I set same path for readiness probe check and target health check)

In our case, it is enough to prestop 5 seconds. Then ALB does not pass traffics to "terminating" pods anymore.
After prestop, I handle SIGTERM signal for graceful shutdown our nodejs applications.
(handling keep-alive connections, set keep-alive time bigger than ALB idle connection timeout, etc)

I'm very thankful of your conversation.
Have luck.

@MatthiasWinzeler
Copy link

MatthiasWinzeler commented Jun 14, 2022

FWIW, if someone faces the same issue and stumbles upon this thread:

we ran into the same issue and contacted AWS support. The statement was that it can indeed happen that after deregistration the ALB still might send new requests to the target. This should be compensated with a pre stop sleep - they recommended 60 secs to be on the safe side.

with 60 secs, our load tests did not show any 502 errors during rolling upgrades.

Also we were told that the issue (ALB sending requests to draining targets) should be fixed in the future, so we expect to eventually be able to decrease the pre stop sleep to a lower number.

@jyotibhanot
Copy link

@MatthiasWinzeler : What should be the value of pre stop hook? Some posts suggest that terminationGracePeriodSeconds > preStop sleep > deregistration delay and other suggests that PreStop hook sleep only need to be controller process time + ELB API propagation time + HTTP req/resp RTT. How can we calculate the prestop hook value?

@MatthiasWinzeler
Copy link

@jyotibhanot We don't have long running requests (which I think would require to respect the deregistration delay). So for us only terminationGracePeriodSeconds > preStop and controller process time + ELB API propagation time + HTTP req/resp RTT appear to matter.

To figure it out for your use case, AWS recommended us to just test it with your applications under some realistic load.

richardTowers added a commit to alphagov/govuk-helm-charts that referenced this issue Jul 24, 2024
During a recent terraform apply in integration which rolled the k8s
nodes, we saw a number of 502 / 503 responses from the load balancers.

The theory is that this is due to kubernetes-sigs/aws-load-balancer-controller
issue #2366 - pods and load balancers are updated at the same time, but
load balancer updates don't happen instantly, so the load balancer may
continue to send traffic to pods which are terminating.

[A comment on another issue](kubernetes-sigs/aws-load-balancer-controller#2106 (comment))
suggests that 60 seconds is long enough to avoid any 502s, although the
rest of the comments suggest "it depends" on various factors.

Whatever, 15 seconds does not seem to be long enough for us to be able
to roll our nodes without serving some 502s, so we should try a higher
value.

I guess higher values will result in slower deployments (not just node
rollouts, but anything which requires pods to terminate and new pods to
come up). 60 seconds still feels just about tolerable to me, but I don't
think we'd want to go much higher than this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants