Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/kafka] Cannot easily scale up kafka brokers - INCONSISTENT_CLUSTER_ID #31404

Closed
nljakubiak opened this issue Jan 16, 2025 · 3 comments
Closed
Assignees
Labels
kafka solved tech-issues The user has a technical issue about an application

Comments

@nljakubiak
Copy link
Contributor

nljakubiak commented Jan 16, 2025

Name and Version

bitnami/kafka 31.2.0

What architecture are you using?

amd64

What steps will reproduce the bug?

  1. Create new environment with kafka setup of 3 controller only nodes and 3 broker only nodes (All parameters that I've changed from default values.yaml shouldn't affect it in any way, because I'm using TLS instead of JKS, setup more users etc.) Minimal working example with changes (stored in kafka_values.yaml):
controller:
  controllerOnly: true
broker:
  replicaCount: 3
  1. Installation is done via helm chart cloned locally by running "helm install -n test -f kafka_values.yaml kafka ./kafka-31.2.0"
  2. 6 pods are getting created and are running without any issues.
  3. Create a topic and produce some data inside. Topic name and data doesn't matter.
# create topic
kafka-topics.sh -create --topic test --bootstrap-server kafka.test.svc.cluster.local:9092 --command-config /tmp/client.properties --replication-factor 3 --partitions 1
# produce data
kafka-console-producer.sh --topic test --bootstrap-server kafka.test.svc.cluster.local:9092 --producer.config /tmp/client.properties
# smash keyboard few times and hit "enter" in the meantime, then exit producer.
# Check data is there and exit consumer
kafka-console-consumer.sh --topic test --from-beginning --bootstrap-server kafka.test.svc.cluster.local:9092 --consumer.config /tmp/client.properties
  1. Try to scale up number of brokers by changing broker.replicaCount to 4, and run helm upgrade.

Are you using any custom parameters or values?

First install:

controller:
  controllerOnly: true
broker:
  replicaCount: 3

Upgrade later:

controller:
  controllerOnly: true
broker:
  replicaCount: 4

What is the expected behavior?

After increasing number of brokers they should join the cluster.

What do you see instead?

New kafka-broker-3 pod is ending in restart loop with errors in logs:

ERROR [RaftManager id=103] Unexpected error INCONSISTENT_CLUSTER_ID in FETCH response: InboundResponse(correlationId=1, data=FetchResponseData(throttleTimeMs=0, errorCode=104, sessionId=0, responses=[]), sourceId=0) (org.apache.kafka.raft.KafkaRaftClient)

Upon investigation checking env variables all broker pods (even failing one - kafka-broker-3) has

KAFKA_KRAFT_CLUSTER_ID=edU0ZBVBw4YH0bL0nXDxik

Further investigation showed that in file /bitnami/kafka/data/meta.properties first 3 brokers have one cluster.id - WltRK7E1MVDcysEbLbO5SA, while 4th (newly added) pod has edU0ZBVBw4YH0bL0nXDxig.

Seems like during brokers creation there is some synchronization between broker nodes, but while cluster is being extended this is not happening out of the box.

Additional information

Upon deletion of /bitnami/kafka/data/meta.properties from PVC new file is being created when pod get restarted and at that time cluster.id in meta.properties file is same as KAFKA_KRAFT_CLUSTER_ID, which makes it impossible to add via helm chart.

I didn't test behavior of meta.properties when kraft.clusterid is set. That might be helpful for newly created clusters, but not for already existing ones.

@nljakubiak nljakubiak added the tech-issues The user has a technical issue about an application label Jan 16, 2025
@github-actions github-actions bot added the triage Triage is needed label Jan 16, 2025
@javsalgar javsalgar changed the title Cannot easily scale up kafka brokers - INCONSISTENT_CLUSTER_ID [bitnami/kafka] Cannot easily scale up kafka brokers - INCONSISTENT_CLUSTER_ID Jan 17, 2025
@github-actions github-actions bot removed the triage Triage is needed label Jan 17, 2025
@github-actions github-actions bot assigned jotamartos and unassigned javsalgar Jan 17, 2025
@github-actions github-actions bot assigned alvneiayu and unassigned jotamartos Jan 22, 2025
Copy link

github-actions bot commented Feb 7, 2025

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

@github-actions github-actions bot added the stale 15 days without activity label Feb 7, 2025
@nljakubiak
Copy link
Contributor Author

Do you folks have any ideas how to fix it? I can propose some solutions. Some options:

  • If it's possible to change existing cluster clusterId - init script, can run sed command on meta.properties file to overwrite current value of cluster id. (this should work for both old and new clusters)
  • If it's impossible to change existing clusterID - that might be "breaking change".
    • for new clusters - init script should set it up based on generated env variable (KAFKA_KRAFT_CLUSTER_ID)
    • not to break existing clusters, new helm variable that would be overwriting sed command in init script to put there value that's already there (instead of KAFKA_KRAFT_CLUSTER_ID) to not break the cluster and being able to scale it up.

@github-actions github-actions bot removed the stale 15 days without activity label Feb 8, 2025
@nljakubiak
Copy link
Contributor Author

nljakubiak commented Feb 10, 2025

I will close the issue. I started investigating it a bit more. It seems if you create new kafka cluster you can scale up number of brokers, and they have consistent clusterID.

I noticed the problem was when kafka cluster was created in helm chart version 25.1.5 (or even 18.X), then upgraded to 31.2.0 (with docker image versions), and then scaled up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kafka solved tech-issues The user has a technical issue about an application
Projects
None yet
Development

No branches or pull requests

5 participants