-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting 502/504 with Pod Readiness Gates during rolling updates #1719
Comments
We've been having the same issue. We confirmed with AWS that there is some propagation time between when some target is marked draining in a target group, and when that target actually stops receiving new connections. So, at the suggestion of other issues I've seen in the old project for this, we added a 20s sleep in a |
@calvinbui The pods needs to have a preStop hook to sleep. since most web framework(e.g. nginx/apache) will stop accept new connections once requested soft stop(sigTerm). and it take some time for the controller to deregister pod(after got endpoint change event), and take time for elb to propagate target changes to it's dataplane. @AirbornePorcine did you still saw 502 with 20s sleep? have you enabled pod readinessGate? If you are using instance mode, u need 30 second extra sleep(since kubeproxy update iptable rules per 30 second). |
@M00nF1sh that's correct, even with a 20s sleep and the auto-injected readinessGate, doing a rolling restart of my pods results in a small amount of 502s. For reference this is like 5-6 502s out of 1m total requests in the same time period, so a very small amount, but still not something we want. I'm using IP mode here. |
@AirbornePorcine in my own test, the sum of And the PreStop hook sleep only need to be Just asked ELB team whether they have p90/p99 metrics available for |
Ok, so, we just did some additional testing on that sleep timing. The only way we've been able to get zero 502s during a rolling deploy, is to set our Looking back in my emails, I realized this is exactly what AWS support had previously told us to do - don't stop the target from processing requests until the target group deregistration delay has elapsed at minimum (we added the 5s to account for the controller process and propagation time as you mentioned). Next week we'll try tweaking our deregistration delay and see if the same holds true (it's currently 60s, but we really don't want to sleep that long if we can avoid it) Something you might want to try though @calvinbui! |
Thanks for the comments. Adding a preStop and sleep, I was able to get all 200s during a rolling update of the deployment. I set deregistration time to 20 seconds and sleep to 30 seconds. However during a node upgrade/rolling update I got 503s for around one minute. Are there any recommendations from AWS about that? I'm guessing I would need to bump up the deregistration and probably the sleep times a lot higher to allow the new node to fire up and the new pods to start as well. |
After increasing sleep to 90s and However, if a deployment only has 1 replica, there is still ~1 min of downtime. For deployments with >=2 replicas, this was not a problem and no downtime was observed. The documentation should be updated, so I'll leave this issue open. EDIT: For the 1 replica issue, it was because k8s doesn't do a rolling deployment during a cluster/node upgrade. It is considered involuntary so I had to scale up to 2 replicas and add a PDB |
How about abusing (?) validationAdmissionWebHook for delaying pod deletion? Here's the sketch of the idea:
edit: I've implemented this idea into a chart here. https://github.com/foriequal0/pod-graceful-drain |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
This is still a serious issue, any update on it? We use currently the solution from @foriequal0 which is really doing a great job so far. I wish this would be officially handled by the controller project itself. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
What's the protocol for getting this prioritized? We've hit it as well. This is a serious issue and while I understand there's a workaround (hack), it's certainly reducing my confidence in running production workloads on this thing. |
I'm also seeing this issue, but I think it's not necessarily an issue with the LB Controller? It seems draining for NLBs doesn't work as I would have expected. Instead of stopping new connections and letting existing connections continue it continues to send new connections to the draining targets for a while. From my testing the actual delay for a target to be fully de-registered and drained seems to be around 2-3 minutes. Adding this to each container exposed behind an NLB have worked for me so far.
I would love to be able to get rid of this but it simply seems that the NLBs are extremely slow in performing management operations. I have even seen target registrations take almost 10 minutes. |
I completely agree with what @ardove has said. The point of this readinessGate feature is to delay the termination of the pod as long as the LB needs it. If I have to update my chart to put a sleep in the preStop hook then it means that this feature is not working. If I have to use the preStop hook then i might as well not even use this readinessGate feature. In my observation the pod is allowed to terminate as soon as the new target group becomes ready/healthy. I have seen that the old target group was still draining after the pod terminates and obviously that's going to result in 502 errors for those requests. This feature almost works. Without the feature enabled I see 30 seconds to 1 minute of solid 502 errors. With the feature enabled I get a brief sluggishness and maybe 1 or a handful of 502's. Hopefully you can get this fixed because unfortunately close to good isn't good enough for something like this. |
I thought it might be useful to share this KubeCon talk, "The Gotchas of Zero-Downtime Traffic /w Kubernetes", where the speaker goes into the strategies required for zero-downtime rolling updates with Kubernetes deployments (at least as of 2022): https://www.youtube.com/watch?v=0o5C12kzEDI It can be a bit hard to conceptualise the limitations of the async nature of Ingress/Endpoint objects and Pod termination, so I found the above talk (and live demo) helped a lot. Hopefully it's useful for others. |
@M00nF1sh I am implementing the same in my kubernetes cluster but unable to calculate the sleep time for prestop hook and terminationGracePeriodSeconds. Currently terminationGracePeriodSeconds is 120 seconds, deregistration delay is 300 seconds.Do we have any mechanism to calculate this? |
Does anyone have a update on this? After almost two years i cannot see that it has been solved natively yet. |
I wonder if finalizers would solve this problem nicely here 🤔 |
For clusters using traefik proxy as ingress it might be worth looking also into the entrypoint lifecycle feature to control graceful shutdowns https://doc.traefik.io/traefik/routing/entrypoints/#lifecycle. |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
would https://kubernetes.io/blog/2022/12/30/advancements-in-kubernetes-traffic-engineering/ |
/remove-lifecycle rotten |
Bumping this issue. Adding sleep() does not sound professional, it's a workaround and only workaround :/ |
I am experiencing this issue, too. |
any update? does pod readiness gate work w/ v2.6 ? |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle rotten |
Hi folks, I wanted to add that I experimented with all suggested solutions here and what finally worked for me. I tried extra sleep during preStop for container and matching extra terminationGracePeriod for the pod, reducing ALB deregistration delay, during preStop explicitly turning the pod healthcheck unhealthy among various experiments and combinations. Even extending the termination for 10 minutes didn't stop traffic continually flowing from the ALBs and the small number of errors right as the pods finished termination. --> I finally tried turning alb.ingress.kubernetes.io/target-type from After reflecting, I don't know why I thought |
/remove-lifecycle rotten |
If anyone faces this, you should do this:
More information in this long article: https://easoncao.com/zero-downtime-deployment-when-using-alb-ingress-controller-on-amazon-eks-and-prevent-502-error/ This makes 502/504 go away completely. |
I did like you described in your article and still have 502/504 issue when I curl my health endpoint every millisecond.
|
Hi Team, Have followed above steps, but no luck. am still facing 502. |
check that you have "sh" in you container, e.g. if you are using gcr.io/distroless/base ensure that you use gcr.io/distroless/base:debug-nonroot-amd64 version which includes /busybox/sh. preStop setting you you kubernetes manifest should also be adjusted with "/busybox/sh" |
Hey Stepan, We are using node:lts-alpine & amazoncorretto:21-alpine-jdk images. sh is present in it |
Hi Team , Is there any solution for this problem, Addind readiness gate , reduce exponential backup , prestop hook none of them help in fix the issue |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues. This bot triages un-triaged issues according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/remove-lifecycle stale |
I'm making use of the Pod Readiness Gate on Kubernetes Deployments running Golang-based APIs. The goal is to achieve full zero downtime deployments.
During a rolling update of the Kubernetes Deployment, I'm getting 502/504 responses from these APIs. This did not happen when setting
target-type: instance
.I believe the problem is that AWS does not drain the pod from the LB before Kubernetes terminates it
Timeline of events:
a. AWS begins de-registered/drained the target
b. Kubernetes begins terminating the pod
This is tested with a looping curl command:
Results:
The text was updated successfully, but these errors were encountered: