Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document zero-downtime deployment for IP targets #2131

Closed
kishorj opened this issue Jul 21, 2021 · 43 comments
Closed

Document zero-downtime deployment for IP targets #2131

kishorj opened this issue Jul 21, 2021 · 43 comments
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.

Comments

@kishorj
Copy link
Collaborator

kishorj commented Jul 21, 2021

Is your feature request related to a problem?
Document setting up zero-downtime deployment with AWS Load balancer controller.

Describe the solution you'd like
A documentation with the detailed steps.

Describe alternatives you've considered
N/A

@kishorj
Copy link
Collaborator Author

kishorj commented Jul 21, 2021

/kind documentation

@k8s-ci-robot k8s-ci-robot added the kind/documentation Categorizes issue or PR as related to documentation. label Jul 21, 2021
@shubham391
Copy link

@kishorj Is there a timeline you're targeting to document how to achieve zero-downtime deployments? If not, could you please give some pointers on how this can be achieved?

Looking at the related issues filed, the solutions mostly are around adding a sleep in preStop step. I'd really appreciate if you could share your recommendation.

@shubham391
Copy link

Found this in documentation: https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/deploy/pod_readiness_gate

This talks about a deploy scenario where service can have an outage. Will give this a try today and see if it solves for my case.

@shubham391
Copy link

Enabling Pod Readiness Gate reduced the 5xx errors, but did not completely eliminate them.

Found this issue #1719 (comment) where @M00nF1sh has explained the breakup of things to consider while deciding the preStop sleep value. After setting an appropriate value in preStop, I'm able to deploy without any errors.

It was also suggested in one of the issues to enable Graceful Shutdown in the server, but I found that if the preStop sleep is high enough, not doing graceful shutdown is also fine since the pod will get fully deregistered from LB during the sleep phase itself. So by the time server receives TERM signal, LB itself would've stopped sending new requests to the pod (and in-flight requests would have also got over). But still good to enable it in case there are any other edge cases.

@keperry
Copy link
Contributor

keperry commented Aug 26, 2021

I did create an article about this a while back. https://aws.plainenglish.io/6-tips-to-improve-availability-with-aws-load-balancers-and-kubernetes-ad8d4d1c0f61
Essentially the steps are:

  1. Handle Shutdown Gracefully
  2. Calibrate Your Timings
  3. Add Pod Anti-Affinity to your Deployment
  4. Use Pod-Readiness Gates
  5. Use The AWS Load Balancer Controller Directly (no Nginx controller or Haproxy controller)
  6. Monitor and Measure Everything!
  7. Use PodDisruptionBudget's
    I would be curious if anyone else has any additional tips.

@shubham391
Copy link

@keperry Thanks for sharing, that was very helpful.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 25, 2021
@project0
Copy link
Contributor

project0 commented Dec 8, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 8, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2022
@project0
Copy link
Contributor

project0 commented Mar 8, 2022

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 8, 2022
@sjmiller609
Copy link

sjmiller609 commented Apr 12, 2022

I haven't got it working yet. Just a simply replacement of the pod (for example change from image: nginx to image: httpd) still causes some connections to drop.

---
apiVersion: v1
kind: Namespace
metadata:
  name: test-nlb-ip
  labels:
    # https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/pod_readiness_gate/
    elbv2.k8s.aws/pod-readiness-gate-inject: enabled
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  namespace: test-nlb-ip
  labels:
    run: my-nginx
  annotations:
    external-dns.alpha.kubernetes.io/hostname: nginx.test.REPLACE_ME.com
    service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: deregistration_delay.timeout_seconds=30,deregistration_delay.connection_termination.enabled=true,preserve_client_ip.enabled=true
    service.beta.kubernetes.io/aws-load-balancer-internal: "false"
    service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
    service.beta.kubernetes.io/aws-load-balancer-type: "external"
    service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
    # service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
    # service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
    # service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
    # service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
spec:
  # externalTrafficPolicy: Local
  # externalTrafficPolicy: Cluster
  type: LoadBalancer
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: my-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  namespace: test-nlb-ip
spec:
  strategy:
    rollingUpdate:
      maxUnavailable: "33%"
  selector:
    matchLabels:
      run: my-nginx
  replicas: 5
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - name: my-nginx
        image: httpd
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "sleep 60"]
        ports:
        - name: http
          containerPort: 80
        readinessProbe:
          httpGet:
            path: /
            port: http
          failureThreshold: 1
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /
            port: http
          failureThreshold: 1
          periodSeconds: 10
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: my-nginx-pdb
spec:
  maxUnavailable: 33%
  selector:
    matchLabels:
      run: my-nginx

Testing with

import requests
from time import sleep, time

hits = 0
miss = 0
average_rtt = 0
count = 0
while True:
    try:
        start = time()
        response = requests.get("http://nginx.test.REPLACE_ME.com/", timeout=10)
        # response = requests.get("http://localhost:8080/", timeout=10)
        end = time()
        millseconds = (end - start) * 1000
        average_rtt = (average_rtt * count + millseconds) / (count + 1)
        count += 1
        if count > 25:
            count = 25
    except:
        response = None
    if response and response.status_code == 200:
        hits += 1
    else:
        miss += 1
    print(f"hits: {hits} misses: {miss} avg rtt: {int(average_rtt)} ms")

Version 2.4, EKS 1.20

@keperry
Copy link
Contributor

keperry commented Apr 12, 2022

@sjmiller609 - are you signalling to the readiness probe to no longer take traffic by throwing a 500 during the "shutdown wait" period? I can't quite tell if your app is doing that. It looks like the "sleep" is handling the "shutdown wait", but if nothing signals to readiness probe (readiness probe must fail), kube will keep sending traffic there. Additionally, I would explicitly set your timeout for readiness probe.

@sjmiller609
Copy link

Thanks, I think this is what I'm missing. I will give this a shot right now!

@sjmiller609
Copy link

I'm giving this a go, but i'm not sure it's quite right because I think you are saying the workload should continue serving regular traffic, just not the readiness probe

        lifecycle:
          preStop:
            exec:
              command:
              - "/bin/sh"
              - "-c"
              - |
                  nginx -c /etc/nginx/nginx.conf -s quit
                  while pgrep -x nginx; do
                    sleep 1
                  done

@sjmiller609
Copy link

Since I will have to work out details in the workload, I will replace my demo service by my actual ingress controller and then report back.

@sjmiller609
Copy link

sjmiller609 commented Apr 12, 2022

I think the intended order of events is:

  • New pods launched
  • Update policy allows all to be launched at the same time
  • Readiness gate applicable to new pods
  • Waiting on initial setup with NLB
  • NLB initial setup ready
  • Pods ready because pass readiness gate
  • Old pods marked as terminating, triggered by the other pods being ready
  • Drain starts on NLB immediately
  • Prestop hook is executed, sleep 180 seconds
    • This is to avoid the limitation NLB may continue to send traffic for up to 180 seconds to a draining target
  • Drain completes before 180 seconds
  • Prestop hook done sleeping
  • SIGTERM sent to pod
  • terminationDrainDuration applied
    • Istio-specific concept
  • 10s for any remaining connections to close and existing connections are force closed by istio
  • NLB will reach deregistration delay after total of 300 seconds
  • NLB will close any remaining connections

It seems like in my case, my workload can just sleep for 180 seconds, and doesn't need to be customized for the readiness probe. It's just about waiting long enough to satisfy the limitation of the AWS NLB.

If the deregistered target stays healthy and an existing connection is not idle, the load balancer can continue to send traffic to the target. To ensure that existing connections are closed, you can do one of the following: enable the target group attribute for connection termination, ensure that the instance is unhealthy before you deregister it, or periodically close client connections.

I'm trying to understand the purpose of @keperry 's suggestion, and I am guessing the reasoning is that by setting readiness to fail, then AWS LB controller will then mark the target as unhealthy (not sure?). Then this satisfies the condition in the above quote to "ensure that the instance is unhealthy before you deregister it"

References:

Other notes:

  • I was finding "random" latency spikes that I was having trouble working out. My monitoring script was confusing me because I was being rate limited by my DNS. To fix, hardcode in /etc/hosts your NLB while running the testing script.

I will post my manifests below that I used to get it working in my case.

@sjmiller609
Copy link

Not shown:

  • Install Istio Operator

The below manifests were working in my test to run the monitoring script and do a "kubectl rollout restart deployments -n istio-system". I think they are not the minimal configuration.

Istio configuration:

---
apiVersion: v1
kind: Namespace
metadata:
  name: istio-system
  labels:
    # https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/deploy/pod_readiness_gate/
    elbv2.k8s.aws/pod-readiness-gate-inject: enabled
---
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-default
  namespace: istio-system
spec:
  meshConfig:
    defaultConfig:
      # The amount of time allowed for connections to complete on proxy shutdown.
      # On receiving SIGTERM or SIGINT, istio-agent tells the active Envoy to
      # start draining, preventing any new connections and allowing existing
      # connections to complete. It then sleeps for the
      # termination_drain_duration and then kills any remaining active
      # Envoy processes. If not set, a default of 5s will be applied.
      #
      # This process will occur after the preStop lifecycle hook.
      # https://cloud.google.com/blog/products/containers-kubernetes/kubernetes-best-practices-terminating-with-grace
      terminationDrainDuration: 10s
  components:
    ingressGateways:
    - enabled: true
      k8s:
        overlays:
        - kind: Deployment
          name: istio-public-ingressgateway
          patches:
          - path: spec.template.spec.containers[name:istio-proxy].lifecycle.preStop.exec.command
            # NLB may continue routing traffic for up to 180 seconds after
            # the endpoint is marked as 'draining' in the NLB.
            # We sleep before initiating shutdown to allow NLB connections
            # to stop coming to the container.
            value:
              - "/bin/sh"
              - "-c"
              - "sleep 180"
          - path: spec.template.spec.terminationGracePeriodSeconds
            # We allow the preStop sleep duration, plus the
            # terminationDrainDuration, plus 10 seconds to terminate.
            value: 200
        podDisruptionBudget:
          maxUnavailable: 33%
        strategy:
          rollingUpdate:
            maxSurge: 100%
            maxUnavailable: 0
        hpaSpec:
          minReplicas: 5
          maxReplicas: 10
        service:
          # Don't configure this section for a real cluster,
          # this configuration present to dodge need of HTTPS,
          # since AWS LB controller will inject pod readiness gates
          # for each port on the service.
          ports:
          - name: http2
            port: 80
            protocol: TCP
            targetPort: 8080
        # service:
        #   externalTrafficPolicy: Local
        serviceAnnotations:
          external-dns.alpha.kubernetes.io/hostname: ha.test.REPLACE_ME.com
          service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: deregistration_delay.timeout_seconds=200,deregistration_delay.connection_termination.enabled=true,preserve_client_ip.enabled=true
          service.beta.kubernetes.io/aws-load-balancer-internal: "false"
          service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
          service.beta.kubernetes.io/aws-load-balancer-type: "external"
          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: "ip"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
          # service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
          # service.beta.kubernetes.io/aws-load-balancer-proxy-protocol: "*"
      name: istio-public-ingressgateway
    - enabled: false
      name: istio-ingressgateway
  hub: gcr.io/istio-release
  profile: default

Configuration of istio

---
apiVersion: networking.istio.io/v1beta1
kind: Gateway
metadata:
  name: istio-public-gateway
  namespace: istio-system
spec:
  selector:
    istio: ingressgateway
  servers:
    - port:
        number: 80
        name: http
        protocol: HTTP
      hosts:
        - "ha.test.REPLACE_ME.com"
---
apiVersion: networking.istio.io/v1beta1
kind: VirtualService
metadata:
  name: my-nginx
  namespace: istio-system
spec:
  hosts:
  - "ha.test.REPLACE_ME.com"
  gateways:
  - istio-public-gateway
  http:
  - route:
    - destination:
        host: my-nginx.test-nlb-ip.svc.cluster.local

Nginx

---
apiVersion: v1
kind: Namespace
metadata:
  name: test-nlb-ip
  labels:
    istio-injection: enabled
---
apiVersion: v1
kind: Service
metadata:
  name: my-nginx
  namespace: test-nlb-ip
  labels:
    run: my-nginx
spec:
  type: ClusterIP
  ports:
  - port: 80
    protocol: TCP
  selector:
    run: my-nginx
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-nginx
  namespace: test-nlb-ip
spec:
  strategy:
    rollingUpdate:
      maxSurge: 100%
      maxUnavailable: 0
  selector:
    matchLabels:
      run: my-nginx
  replicas: 5
  template:
    metadata:
      labels:
        run: my-nginx
    spec:
      containers:
      - name: my-nginx
        image: nginx
        lifecycle:
          preStop:
            exec:
              command:
              - "/bin/sh"
              - "-c"
              - |
                  nginx -c /etc/nginx/nginx.conf -s quit
                  while pgrep -x nginx; do
                    sleep 1
                  done
                  echo "done"
        ports:
        - name: http
          containerPort: 80
        readinessProbe:
          httpGet:
            path: /
            port: http
          failureThreshold: 2
          timeoutSeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /
            port: http
          failureThreshold: 2
          timeoutSeconds: 5
          periodSeconds: 10
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: my-nginx-pdb
  namespace: test-nlb-ip
spec:
  maxUnavailable: 33%
  selector:
    matchLabels:
      run: my-nginx

@project0
Copy link
Contributor

@sjmiller609 tldr; checkout this workaround: #1719 (comment)

@sjmiller609
Copy link

Update, this configuration has been working perfectly for a few weeks:

apiVersion: v1
kind: Namespace
metadata:
  name: istio-config
---
apiVersion: v1
kind: Namespace
metadata:
  labels:
    elbv2.k8s.aws/pod-readiness-gate-inject: enabled
  name: istio-system
---
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: istio-default
  namespace: istio-system
spec:
  components:
    ingressGateways:
    - enabled: true
      k8s:
        hpaSpec:
          maxReplicas: 15
          minReplicas: 5
        nodeSelector:
          spotinst.io/node-lifecycle: od
        overlays:
        - kind: Deployment
          name: istio-public-ingressgateway
          patches:
          - path: spec.template.spec.containers[name:istio-proxy].lifecycle.preStop.exec.command
            value:
            - /bin/sh
            - -c
            - sleep 180
          - path: spec.template.spec.terminationGracePeriodSeconds
            value: 200
          - path: spec.template.metadata.labels.spotinst\.io/restrict-scale-down
            value: "true"
        podAnnotations:
          ad.datadoghq.com/tags: '{"source": "envoy", "service": "istio-public-ingressgateway"}'
        podDisruptionBudget:
          maxUnavailable: 20%
        serviceAnnotations:
          external-dns.alpha.kubernetes.io/hostname: platform.getcerebral.com,portal.getcerebral.com
          service.beta.kubernetes.io/aws-load-balancer-cross-zone-load-balancing-enabled: "true"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-healthy-threshold: "2"
          service.beta.kubernetes.io/aws-load-balancer-healthcheck-unhealthy-threshold: "2"
          service.beta.kubernetes.io/aws-load-balancer-internal: "false"
          service.beta.kubernetes.io/aws-load-balancer-nlb-target-type: ip
          service.beta.kubernetes.io/aws-load-balancer-scheme: internet-facing
          service.beta.kubernetes.io/aws-load-balancer-target-group-attributes: deregistration_delay.timeout_seconds=200,deregistration_delay.connection_termination.enabled=true,preserve_client_ip.enabled=false
          service.beta.kubernetes.io/aws-load-balancer-type: external
        strategy:
          rollingUpdate:
            maxSurge: 100%
            maxUnavailable: 0
      name: istio-public-ingressgateway
    - enabled: false
      name: istio-ingressgateway
    pilot:
      k8s:
        hpaSpec:
          maxReplicas: 10
          minReplicas: 3
        resources:
          limits:
            cpu: 2000m
            memory: 2Gi
          requests:
            cpu: 500m
            memory: 2Gi
        serviceAnnotations:
          ad.datadoghq.com/endpoints.check_names: '["istio"]'
          ad.datadoghq.com/endpoints.init_configs: '[{}]'
          ad.datadoghq.com/endpoints.instances: |
            [
              {
                "istiod_endpoint": "http://%%host%%:15014/metrics",
                "use_openmetrics": true
              }
            ]
  hub: gcr.io/istio-release
  meshConfig:
    accessLogFile: /dev/stdout
    defaultConfig:
      terminationDrainDuration: 10s
    extensionProviders:
    - envoyExtAuthzHttp:
        headersToDownstreamOnDeny:
        - uid
        - client
        - access-token
        headersToUpstreamOnAllow:
        - uid
        - client
        - access-token
        includeHeadersInCheck:
        - uid
        - client
        - access-token
        pathPrefix: /api/v1/auth/istio
        port: "80"
        service: auth-service.apps.svc.cluster.local
      name: auth-service
  profile: default

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 28, 2022
@project0
Copy link
Contributor

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2022
@Constantin07
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 21, 2023
@luisiturrios1
Copy link

luisiturrios1 commented Jul 25, 2023

Any way to define terminationGracePeriodSeconds on helm installation ?

@dvbthijsvnuland
Copy link

unfortunately not, we used kustomize on top of helm (macgyver solution?)

@clayvan
Copy link

clayvan commented Nov 4, 2023

@woehrl01 Could you elaborate on your setup with distroless Istio proxies, as I don't see a way to achieve it without some form of preStop?

See my comment here istio/istio#47265 (comment)

But I don't see how MINIMUM_DRAIN_DURATION achieves the same thing as a preStop sleep.

@woehrl01
Copy link

woehrl01 commented Nov 4, 2023

@clayvan you're right and I apologies for not updating on this thread. Even though the config I mentioned above does work a few times, it's not reliable to achieve a zero downtime on AWS with NLB. The only way we archived this is by injecting the already mentioned prestop hook.

@meisfrancis
Copy link

To fix this issue when using Istio + NLB (IP Tragets), here are the working defaults.

Ingress gateway Deployment-

terminationGracePeriodSeconds: 300
podAnnotations:
  proxy.istio.io/config: |
    drainDuration: 300s
    parentShutdownDuration: 301s
    terminationDrainDuration: 302s

@hariomsaini , this solution might not work when using Istio Gateway Helmchart. Because the pipeline operator in Istio refers to an object. The following patch is for customizing the Istio Deployment

apiVersion: builtin
kind: PatchTransformer
metadata:
  name: patch-graceful-shutdown
target:
  kind: IstioOperator
patch: |
  - op: add
    path: /spec/components/ingressGateways/0/k8s/overlays/0/patches/-
    value:
      path: spec.template.metadata.annotations.proxy\.istio\.io/config
      value: |
        drainDuration: 360s
        parentShutdownDuration: 361s
        terminationDrainDuration: 362s

The manifest of Istio Operator will now be

apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
  name: default-istiocontrolplane
  namespace: istio-system
spec:
  components:
    ingressGateways:
    - enabled: true
      k8s:
        hpaSpec:
          maxReplicas: 30
          minReplicas: 3
        overlays:
        - kind: Deployment
          name: istio-ingressgateway
          patches:
          - path: spec.template.metadata.annotations.proxy\.istio\.io/config
            value: |
              drainDuration: 360s
              parentShutdownDuration: 361s
              terminationDrainDuration: 362s

This is an invalid manifest. Because proxy.istio.io/config needs a string value, but it receives an object on account of the behavior of (|) in Istio. Read here https://istio.io/latest/docs/setup/additional-setup/customize-installation/#patching-the-output-manifest

@gerasym
Copy link

gerasym commented Jan 31, 2024

Did anyone achieve the zero downtime with instance target type?
AFAIK IP target type works only with AWS VPC CNI, is there a solution for NLB (Instance targets) and any other CNI?

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2024
@Constantin07
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 30, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2024
@cbugneac-nex
Copy link

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 29, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 27, 2024
@jtnz
Copy link

jtnz commented Nov 21, 2024

FWIW there's this documentation, which is the most "complete" I'm aware of.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 21, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue with /reopen
  • Mark this issue as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/documentation Categorizes issue or PR as related to documentation. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed.
Projects
None yet
Development

No branches or pull requests