Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

operator v1: do not decom broker if it's missing in RP #317

Merged
merged 1 commit into from
Nov 19, 2024

Conversation

birdayz
Copy link
Contributor

@birdayz birdayz commented Nov 19, 2024

avoid setting decommissionBrokerID for an already-decommissioned-broker by also checking in redpanda if that broker exists.

lots of context can be found in: #314

@chrisseto
Copy link
Contributor

Thus far, I like the approach here the most but it still feels like we're papering over the larger issue here: A lack of idempotency and crash safety.

Stale reads are a symptom of the above but we could easily hit the same problem if the operator crashes (or gets evicted) at an inopportune time.

Couldn't we largely eliminate the problem if the handleScaling function pulled state information from the StatefulSet and adminAPI and pushed that into .Status instead of relying on .Status? (Similar to what Rafal has mentioned in Slack).

It doesn't seem like that large of a change to me and I suspect a lot of logic in this code path could be simplified.

For example, handleDecommissionInProgress could first fetch the brokers list and brokers uuid list from the admin API. If there's a broker that's in the process of decommissioning, you know there's a decommission in progress. With additional information from the StatefulSets/Pod list you should be able to deduce if the decommission hasn't yet started or if it's finished.

avoid setting decommissionBrokerID for an already-decommissioned-broker
by also checking in redpanda if that broker exists.
@birdayz birdayz force-pushed the jb/do-not-decom-missing-broker branch from ad769be to 0bc1b40 Compare November 19, 2024 17:15
@birdayz
Copy link
Contributor Author

birdayz commented Nov 19, 2024

Regarding crash safety, the same check could be added in the cluster_controller.go, and clear the status decommissionBrokerID if required.

@RafalKorepta RafalKorepta merged commit 942191b into main Nov 19, 2024
5 checks passed
@chrisseto chrisseto deleted the jb/do-not-decom-missing-broker branch November 19, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants