Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster restarts in sequence when config changes, rather than in parallel #1082

Open
peeveen opened this issue Feb 5, 2025 · 0 comments
Open

Comments

@peeveen
Copy link

peeveen commented Feb 5, 2025

I have a Helm chart containing a CrdbCluster resource.

This Helm chart creates certificate secrets (CA, node, root) then uses the names of those secrets in the nodeTLSSecret and clientTLSSecrets values of the CrdbCluster resource config.

If the secrets already exist (found using Helm's lookup function), the existing certificate data is used. In this way, repeating the helm install or helm upgrade commands won't keep creating new certificates.

However, if I manually delete the CA certificate secret, the chart will generate all-new certificate secrets, albeit with the same names.
This means that the CrdbCluster config does not change, so the DB pods remain running with the old certificate data in their cockroach-certs folders.
This is not what I'm aiming for. I want the DB pods to be rebuilt when the certificates change.

So I added some additionalAnnotations to the CrdbCluster resource, containing the SHA1 hashes of the certificate data.
This looked like it was going to work, but something (the operator?) is attempting to restart the cluster pods one by one, rather than all at once.
So the first cluster pod restarts (with the new certificate data contained within), and doesn't report as "healthy" until it has successfully contacted the other cluster pods, which it fails to do, as they still contain the old certificate data. The operator doesn't seem to want to move on rebuilding the next DB pod until this first one reports as healthy.
The only way I can get them all to restart is to manually delete them.

Is it possible for the operator to restart them all at once?
(actually, I'm assuming it's the operator that's controlling this, and not some fundamental k8s component ... maybe you could confirm this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant