Brokers became unreachable during CA rotation #11096
Unanswered
mbrembilla
asked this question in
Q&A
Replies: 1 comment 3 replies
-
Why did the brokers become unreachable for clients? What exactly it even means? What were the errors etc.? Without full logs from all the components and all configurations, nobody will be able to give you any answers. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello,
I’m running a Kafka cluster on Kubernetes using Strimzi, and I recently encountered a disruption during a CA rotation.
Here’s a quick summary of the issue:
• My cluster-operator started to update the CA certificates for Zookeeper.
• Zookeeper restarted and updated its certificates without apparent errors.
• Right after Zookeeper was updated, all my Kafka brokers became unreachable for clients, causing a service downtime.
• The outage lasted until all brokers finished restarting with the new certificates.
My expectation was that Strimzi’s rolling upgrade/rotation process would prevent a full outage, but that didn’t happen. Instead, it looks like the system became unreachable to my clients during the entire transition period.
I would really appreciate help figuring out what might have gone wrong and how to avoid it in the future. Specifically:
1. Is this an expected scenario if the CA rotation steps aren’t followed in a certain order?
2. Could there be a misconfiguration or a known issue that prevents Strimzi’s rolling restart from maintaining client connectivity?
3. Are there recommended best practices or prerequisites I should verify before triggering a CA rotation?
Strimzi version: 0.39
Kafka version: 3.6.1
Kubernetes version: v1.31.4-eks-2d5f260
If you need additional details, please let me know:
• Which logs would you like to see? (Operator logs, broker logs, Zookeeper logs, etc.)
• What exact configurations or CRD details can I provide (Kafka CR, KafkaUser CR, etc.)?
• Any information about Strimzi, Kafka or Kubernetes environment specifics that might be relevant?
I’d be happy to share any data or logs that could help diagnose the problem. Thank you in advance for your assistance, and I’m looking forward to any suggestions or insights you can provide
Regards,
Mauro
Beta Was this translation helpful? Give feedback.
All reactions