-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NLB PodReadinessGate instability #4034
Comments
How are you creating the namespace? It's very important that the order of resource creation goes:
|
Yeah, we are creating namespace before everything else with proper label In addition to that, we have canary deployment with ALB in the same namespace and everything works well, so it's definitely not a namespace issue |
What is the order in which you create the deployment / (service or ingress)? Basically, if you create the deployment first, then your initial pods could come up with no readiness gates as at creation time they were not associated to a Load Balancer. If you create the ingress or service then create the deployment then when your pods are created they are already associated to the load balancer and hence will have readiness gates attached. These are the scenarios I tested with an NLB: The initial pods came up with no readiness check.
The initial pods came up with a readiness check
|
@zac-nixon i've tested both approaches, and results are the same, we are operating with replicasets and SVC. SVC are being created first and after that replica sets are being created, in each new deployment of service new replica set is created and SVC not changing at all. Tried both scenarios, and result is the same: first pod in the same replica set is not being injected with readiness gate When i'm restarting pods in the same replica set all pods are being applied with readiness gate, the problem comes up only when new replica set is created, all other objects (including SVC) are not changed or recreated |
I think I see the issue, and it relates to how Kubernetes handles eventually consistency. I can repro the same behavior:
svc.yaml
rs.yaml
The pods come up with no readiness gates attached. The issue is because of eventual consistency within Kubernetes and the LBC. I was able to solve this issue by adding a sleep between each command, to ensure the LBC has a correctly warmed cache before processing each operation.
I know it's not a great solution but there is some pretty serious architectural limitations at play. Can you try with some time between each of the operations?
|
@zac-nixon hi, this workaround works fine, it's not the best solution, but it works. |
I'm in favor of leaving this open as it's a legitimate issue although I do not have the time to work on a proper fix atm. |
Describe the bug
A concise description of what the bug is.
We are using Argo Rollout canary strategy with NLB operated on Service level, during deployment of new version first pod is missing readiness gate, second one that schedules in the same service does have readiness gate. Looks like ALB controller misses injection for the first pod.
On the screenshot you can see that the first(older) pod goes without readiness gate, next one does have readiness gate.

Steps to reproduce
kubectl get pod -l 'app.kubernetes.io/name=<app_name>' -o wide -n <namespace>
Expected outcome
A concise description of what you expected to happen.
Readiness gate applies to first pod during rollout process.
Environment
Services types are:
LoadBalancer
Annotations and ports configuration:
As a result: 2 loadbalancers are being created, 2 services and 6 target group bindings
Additional Context:
The text was updated successfully, but these errors were encountered: