-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Operator does not ensure that a clickhouse keeper pod is running before proceeding with the restart of another pod #1598
Comments
@mandreasik , please use 0.24.2 or later. It works differently, though still allows a downtime. 1st and 2nd nodes are restarted almost at the same time:
|
@alex-zaitsev I've tried version 0.24.2 as suggested and found another issue. I've tested it on k3d 1.28.x and also on 1.29.x, and I got different results, but both show very weird behavior. Steps to reproduce:
I've deleted the
If the cluster starts without any issues, repeat steps 3 to 4 again. From my tests, I observed two outcomes:
Could you perform such tests on you environment? |
Hi,
I'm using clickhouse-operator version 0.24.0 and I've encountered the following issue:
When applying new change to clickhouseKeeper cluster operator does not ensure that a ClickHouseKeeper pod is running before proceeding with the restart of another pod (even though the previous one is still being created).
Let's look at the status of the pods when I changed
ClickHouseKeeperInstallation
:Cluster is applying the new change:
So far so good, but let's see what happens next:
and here goes our problem
As you can see, pod
cluster1-0-1-0
is still inContainerCreating
state, but the operator has already decided to terminate podcluster1-0-2-0
.This caused the cluster to lose quorum for a short time, which ClickHouse did not liked, resulting in the following error:
I was expecting that clickhouse keeper cluster will apply new changes without any disruptions to clickhouse cluster.
The text was updated successfully, but these errors were encountered: