Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Increase upgrade remediation to 5 retries
As `TestStableUIDAndGeneration` test is flaky in our CI, but in local environment the failure does not show up. It might be that machine is overwhelmed by the amount of processes that bootstrap Kubernetes cluster. This change extends time that is required for Redpanda to be up and ready by increasing Flux upgrade remediation retires from 0 (default) to 5. In one of the nightly tests the cert-manager seems to take longer to create self-signed certificate than the limit that flux helm release have. Reference ``` helmrepository_controller.go:700: "level"=0 "msg"="artifact up-to-date with remote revision: 'sha256:d5b03c5514669e04ecd4793df2b927bde4309ab8d088e4f8aa52d4e7a9ce2e94'" "controller"="helmrepository" "controllerGroup"="source.toolkit.fluxcd.io" "controllerKind"="HelmRepository" "HelmRepository"={"name"="redpanda-repository" "namespace"="testenv-2cqq8"} "namespace"="testenv-2cqq8" "name"="redpanda-repository" "reconcileID"="e47a07d7-34c9-4532-99cd-798b47d25948" helmchart_template.go:151: "level"=0 "msg"="HelmChart/testenv-2cqq8/testenv-2cqq8-rp-iex3qh with SourceRef 'HelmRepository/testenv-2cqq8/redpanda-repository' is in-sync" "controller"="helmrelease" "controllerGroup"="helm.toolkit.fluxcd.io" "controllerKind"="HelmRelease" "HelmRelease"={"name"="rp-iex3qh" "namespace"="testenv-2cqq8"} "namespace"="testenv-2cqq8" "name"="rp-iex3qh" "reconcileID"="6480ffe6-2b63-4b0a-b0bd-e0f377670154" controller.go:324: "msg"="Reconciler error" "error"="error fetching server root CA testenv-2cqq8/rp-iex3qh-default-root-certificate: server TLS certificate not found" "controller"="redpanda" "controllerGroup"="cluster.redpanda.com" "controllerKind"="Redpanda" "Redpanda"={"name"="rp-iex3qh" "namespace"="testenv-2cqq8"} "namespace"="testenv-2cqq8" "name"="rp-iex3qh" "reconcileID"="86702d6b-5859-484b-a1b0-9d847afda18e" atomic_release.go:419: "level"=0 "msg"="release is in a failed state" "controller"="helmrelease" "controllerGroup"="helm.toolkit.fluxcd.io" "controllerKind"="HelmRelease" "HelmRelease"={"name"="rp-iex3qh" "namespace"="testenv-2cqq8"} "namespace"="testenv-2cqq8" "name"="rp-iex3qh" "reconcileID"="6480ffe6-2b63-4b0a-b0bd-e0f377670154" controller.go:324: "msg"="Reconciler error" "error"="error fetching server root CA testenv-2cqq8/rp-iex3qh-default-root-certificate: server TLS certificate not found" "controller"="redpanda" "controllerGroup"="cluster.redpanda.com" "controllerKind"="Redpanda" "Redpanda"={"name"="rp-iex3qh" "namespace"="testenv-2cqq8"} "namespace"="testenv-2cqq8" "name"="rp-iex3qh" "reconcileID"="3c29782a-7556-40a6-a8b2-9b5fed123672" controller.go:324: "msg"="Reconciler error" "error"="terminal error: exceeded maximum retries: cannot remediate failed release" "controller"="helmrelease" "controllerGroup"="helm.toolkit.fluxcd.io" "controllerKind"="HelmRelease" "HelmRelease"={"name"="rp-iex3qh" "namespace"="testenv-2cqq8"} "namespace"="testenv-2cqq8" "name"="rp-iex3qh" "reconcileID"="6480ffe6-2b63-4b0a-b0bd-e0f377670154" controller.go:324: "msg"="Reconciler error" "error"="error fetching server root CA testenv-2cqq8/rp-iex3qh-default-root-certificate: server TLS certificate not found" "controller"="redpanda" "controllerGroup"="cluster.redpanda.com" "controllerKind"="Redpanda" "Redpanda"={"name"="rp-iex3qh" "namespace"="testenv-2cqq8"} "namespace"="testenv-2cqq8" "name"="rp-iex3qh" "reconcileID"="172e63ab-3553-4c01-a3ba-baca0a338207" ``` https://buildkite.com/redpanda/redpanda-operator/builds/3493#0193901d-c9f9-4b84-a4db-274dec903fe9/1221-2076 ``` === NAME TestRedpandaController/TestStableUIDAndGeneration redpanda_controller_test.go:930: waiting for *v1alpha2.Redpanda "rp-iex3qh" to be ready redpanda_controller_test.go:921: Error Trace: /work/operator/internal/controller/redpanda/redpanda_controller_test.go:921 /work/operator/internal/controller/redpanda/redpanda_controller_test.go:891 /work/operator/internal/controller/redpanda/redpanda_controller_test.go:101 Error: Received unexpected error: context deadline exceeded Test: TestRedpandaController/TestStableUIDAndGeneration ``` https://buildkite.com/redpanda/redpanda-operator/builds/3493#0193901d-c9f9-4b84-a4db-274dec903fe9/1221-2178
- Loading branch information