Delay pod deletion to handle deregistraton delay #1775

foriequal0 · 2021-01-23T15:10:35Z

Pods are deleted during Deployment rollout and then removed from
Endpoints. When deleted, the controller deregisters from TargetGroups
and then the ELB drains the traffic. The delay between Pod deletion and
the propagation of draining is known to be the cause of ELB 5xx error.
ELB tries to send traffic to Pods that are tereminated without knowing
that they are terminated until Pods are completely deregistered.

We solve this problem by keeping Pods alive for a delay time. There is a
well-known solution that attaches a "sleep" command to preStop lifecycle
hook to keep them alive for the delay. However, this is an ugly
mitigation like duct tape, and it might be difficult to hook up a "sleep
9999" to some Deployments depending on their configurations.

In this commit, we use ValidatingAdmissionWebhook to be notified about
Pods that need be deleted, and block their immediate deletions. By doing
so, we can do whatever we want with them while keeping them alive. We
remove all labels from them, so they are removed from the Endpoints, and
are deregistered from TargetGroups. We also remove ownerReferences so it
doesn't kick GC in. Most importantly, they are still alive so they can
continue to serve incoming new traffics received during the delay. After
letting them alive for the delay, we can safely delete them as there
shouldn't be any new traffic coming in.

You can test with the prebuilt docker image: https://github.com/users/foriequal0/packages/container/aws-load-balancer-controller/901501 I've also forked eks-charts/aws-load-balancer-controller https://github.com/foriequal0/eks-charts/tree/drain/stable/aws-load-balancer-controller You can use this helm chart value: ```yaml podGracefulDrainDelay: 90s image: repository: ghcr.io/foriequal0/aws-load-balancer-controller tag: v2.1.1-2-geb716265 ```

Please see https://github.com/foriequal0/pod-graceful-drain

closes: #1719
closes: #1065
related?: #1403

k8s-ci-robot · 2021-01-23T15:10:42Z

Welcome @foriequal0!

It looks like this is your first PR to kubernetes-sigs/aws-load-balancer-controller 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/aws-load-balancer-controller has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

k8s-ci-robot · 2021-01-23T15:10:42Z

Hi @foriequal0. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot · 2021-01-23T15:10:54Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: foriequal0
To complete the pull request process, please assign after the PR has been reviewed.
You can assign the PR to them by writing /assign in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

codecov-io · 2021-01-23T15:13:48Z

Codecov Report

Merging #1775 (eb71626) into main (3807a2d) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #1775   +/-   ##
=======================================
  Coverage   46.76%   46.76%           
=======================================
  Files         110      110           
  Lines        5988     5988           
=======================================
  Hits         2800     2800           
  Misses       2925     2925           
  Partials      263      263

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3807a2d...eb71626. Read the comment docs.

Pods are deleted during Deployment rollout and then removed from Endpoints. When deleted, the controller deregisters from TargetGroups and then the ELB drains the traffic. The delay between Pod deletion and the propagation of draining is known to be the cause of ELB 5xx error. ELB tries to send traffic to Pods that are tereminated without knowing that they are terminated until Pods are completely deregistered. We solve this problem by keeping Pods alive for a delay time. There is a well-known solution that attaches a "sleep" command to preStop lifecycle hook to keep them alive for the delay. However, this is an ugly mitigation like duct tape, and it might be difficult to hook up a "sleep 9999" to some Deployments depending on their configurations. In this commit, we use ValidatingAdmissionWebhook to be notified about Pods that need be deleted, and block their immediate deletions. By doing so, we can do whatever we want with them while keeping them alive. We remove all labels from them, so they are removed from the Endpoints, and are deregistered from TargetGroups. We also remove ownerReferences so it doesn't kick GC in. Most importantly, they are still alive so they can continue to serve incoming new traffics received during the delay. After letting them alive for the delay, we can safely delete them as there shouldn't be any new traffic coming in.

foriequal0 · 2021-01-26T04:15:07Z

I think we shouldn't remove all labels. Should I remove only labels that are defined by the service's selector? Also shoud I add support an option to delay them until the draining is fully finished (deregistration_delay. timeout_seconds?) if the app is not support connection draining? Anyway, it works for me perfectly at this point.
Also I found that this sentence in the comment on the service's selector: "If empty or not present, the service is assumed to have an external process managing its endpoints, which Kubernetes will not modify." In case of it, should we directly edit the endpoint rather than editing the pod's label?

foriequal0 · 2021-01-26T06:54:01Z

I found that kubectl drain [node], which uses eviction api bypasses the admission so the pod will deleted immediately. The same API is used by aws-node-termination-handler. But curiously the label is changed after that. And also kubectl drain [node] --disable-eviction, which uses normal pod delete doesn't continue to delete remaining pods at the first failure :/.

I'm thinking this is more hackisher than I thought. I should make this as a separate project.

mihasya · 2021-01-27T18:20:40Z

Came here after several days of trying to configure the this thing JUST SO and still getting handfulls of 500s on every deploy. Thank you for this work, I really hope a clean way to make this work can be figured out. Of course, drain node is an extremely important use-case. If you come up with a solution and need help testing, I would gladly run this PR on our staging environment. Thanks for digging into this!

foriequal0 · 2021-02-05T09:32:44Z

@mihasha I've made this a seperated package here https://github.com/foriequal0/pod-graceful-drain
Can you test that? If you have any question, please leave an issue to there.
I'll close this PR.

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 23, 2021

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 23, 2021

foriequal0 changed the title ~~Delay pod deletion to handle deregistraton delay~~ [WIP] Delay pod deletion to handle deregistraton delay Jan 23, 2021

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 23, 2021

foriequal0 changed the title ~~[WIP] Delay pod deletion to handle deregistraton delay~~ Delay pod deletion to handle deregistraton delay Jan 25, 2021

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Jan 25, 2021

foriequal0 closed this Feb 5, 2021

foriequal0 deleted the drain branch February 5, 2021 09:33

foriequal0 mentioned this pull request Jun 10, 2021

400/502/504 errors while doing rollout restart or rolling update #1065

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delay pod deletion to handle deregistraton delay #1775

Delay pod deletion to handle deregistraton delay #1775

foriequal0 commented Jan 23, 2021 •

edited

Loading

k8s-ci-robot commented Jan 23, 2021

k8s-ci-robot commented Jan 23, 2021

k8s-ci-robot commented Jan 23, 2021

codecov-io commented Jan 23, 2021 •

edited

Loading

foriequal0 commented Jan 26, 2021 •

edited

Loading

foriequal0 commented Jan 26, 2021 •

edited

Loading

mihasya commented Jan 27, 2021

foriequal0 commented Feb 5, 2021 •

edited

Loading

Delay pod deletion to handle deregistraton delay #1775

Delay pod deletion to handle deregistraton delay #1775

Conversation

foriequal0 commented Jan 23, 2021 • edited Loading

k8s-ci-robot commented Jan 23, 2021

k8s-ci-robot commented Jan 23, 2021

k8s-ci-robot commented Jan 23, 2021

codecov-io commented Jan 23, 2021 • edited Loading

Codecov Report

foriequal0 commented Jan 26, 2021 • edited Loading

foriequal0 commented Jan 26, 2021 • edited Loading

mihasya commented Jan 27, 2021

foriequal0 commented Feb 5, 2021 • edited Loading

foriequal0 commented Jan 23, 2021 •

edited

Loading

codecov-io commented Jan 23, 2021 •

edited

Loading

foriequal0 commented Jan 26, 2021 •

edited

Loading

foriequal0 commented Jan 26, 2021 •

edited

Loading

foriequal0 commented Feb 5, 2021 •

edited

Loading