Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator does not ensure that a clickhouse keeper pod is running before proceeding with the restart of another pod #1598

Open
mandreasik opened this issue Dec 12, 2024 · 2 comments
Assignees

Comments

@mandreasik
Copy link

Hi,

I'm using clickhouse-operator version 0.24.0 and I've encountered the following issue:

When applying new change to clickhouseKeeper cluster operator does not ensure that a ClickHouseKeeper pod is running before proceeding with the restart of another pod (even though the previous one is still being created).

Let's look at the status of the pods when I changed ClickHouseKeeperInstallation:

Cluster is applying the new change:

NAME                          READY   STATUS        RESTARTS   AGE
chk-extended-cluster1-0-0-0   1/1     Terminating   0          6m10s
chk-extended-cluster1-0-1-0   1/1     Running       0          6m10s
chk-extended-cluster1-0-2-0   1/1     Running       0          5m25s
(...)
NAME                          READY   STATUS              RESTARTS   AGE
chk-extended-cluster1-0-0-0   0/1     ContainerCreating   0          65s
chk-extended-cluster1-0-1-0   1/1     Running             0          7m19s
chk-extended-cluster1-0-2-0   1/1     Running             0          6m34s
(...)
NAME                          READY   STATUS    RESTARTS   AGE
chk-extended-cluster1-0-0-0   1/1     Running   0          78s
chk-extended-cluster1-0-1-0   1/1     Running   0          7m32s
chk-extended-cluster1-0-2-0   1/1     Running   0          6m47s

So far so good, but let's see what happens next:

NAME                          READY   STATUS        RESTARTS   AGE
chk-extended-cluster1-0-0-0   1/1     Running       0          81s
chk-extended-cluster1-0-1-0   1/1     Terminating   0          7m35s
chk-extended-cluster1-0-2-0   1/1     Running       0          6m50s
(...)
NAME                          READY   STATUS    RESTARTS   AGE
chk-extended-cluster1-0-0-0   1/1     Running   0          82s
chk-extended-cluster1-0-2-0   1/1     Running   0          6m51s

and here goes our problem

NAME                          READY   STATUS              RESTARTS   AGE
chk-extended-cluster1-0-0-0   1/1     Running             0          86s
chk-extended-cluster1-0-1-0   0/1     ContainerCreating   0          1s
chk-extended-cluster1-0-2-0   1/1     Terminating         0          6m55s

As you can see, pod cluster1-0-1-0 is still in ContainerCreating state, but the operator has already decided to terminate pod cluster1-0-2-0.

This caused the cluster to lose quorum for a short time, which ClickHouse did not liked, resulting in the following error:

error": "(CreateMemoryTableQueryOnCluster) Error when executing query: code: 999, message: All connection tries failed while connecting to ZooKeeper. nodes: 10.233.71.16:9181, 10.233.81.20:9181, 10.233.70.35:9181\nCode: 999. Coordination::Exception: Keeper server rejected the connection during the handshake. Possibly it's overloaded, doesn't see leader or is stale: while receiving handshake from ZooKeeper. (KEEPER_EXCEPTION) (version 24.8.2.3 (official build)), 10.233.71.16:9181\nPoco::Exception. Code: 1000, e.code() = 111, Connection refused (version 24.8.2.3 (official build))

I was expecting that clickhouse keeper cluster will apply new changes without any disruptions to clickhouse cluster.

@mandreasik mandreasik changed the title Operator does not ensure that a ClickHouseKeeper pod is running before proceeding with the restart of another pod Operator does not ensure that a clickhouse keeper pod is running before proceeding with the restart of another pod Dec 12, 2024
@alex-zaitsev
Copy link
Member

@mandreasik , please use 0.24.2 or later. It works differently, though still allows a downtime. 1st and 2nd nodes are restarted almost at the same time:

test-049-2c940a34-c428-11ef-9ccd-acde48001122   chk-clickhouse-keeper-test-0-0-0                       1/1     Running   0                57s
test-049-2c940a34-c428-11ef-9ccd-acde48001122   chk-clickhouse-keeper-test-0-1-0                       1/1     Running   0                54s
test-049-2c940a34-c428-11ef-9ccd-acde48001122   chk-clickhouse-keeper-test-0-2-0                       1/1     Running   0                22s

@mandreasik
Copy link
Author

@alex-zaitsev I've tried version 0.24.2 as suggested and found another issue. I've tested it on k3d 1.28.x and also on 1.29.x, and I got different results, but both show very weird behavior.

Steps to reproduce:

  1. Create a ClickHouse Keeper cluster using the example configuration provided here:
apiVersion: "clickhouse-keeper.altinity.com/v1"
kind: "ClickHouseKeeperInstallation"
metadata:
  name: extended
spec:
  configuration:
    clusters:
      - name: "cluster1"
        layout:
          replicasCount: 3
    settings:
      logger/level: "trace"
      logger/console: "true"
      listen_host: "0.0.0.0"
      keeper_server/four_letter_word_white_list: "*"
      keeper_server/coordination_settings/raft_logs_level: "information"
      prometheus/endpoint: "/metrics"
      prometheus/port: "7000"
      prometheus/metrics: "true"
      prometheus/events: "true"
      prometheus/asynchronous_metrics: "true"
      prometheus/status_info: "false"
      keeper_server/coordination_settings/force_sync: "false"

  defaults:
    templates:
      # Templates are specified as default for all clusters
      podTemplate: default
      dataVolumeClaimTemplate: default

  templates:
    podTemplates:
      - name: default
        metadata:
          labels:
            app: clickhouse-keeper
        spec:
          containers:
            - name: clickhouse-keeper
              imagePullPolicy: IfNotPresent
              image: "clickhouse/clickhouse-keeper:latest"
              resources:
                requests:
                  memory: "256M"
                  cpu: "1"
                limits:
                  memory: "4Gi"
                  cpu: "2"
          securityContext:
            fsGroup: 101

    volumeClaimTemplates:
      - name: default
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 1Gi

I've deleted the affinity section from the example to allow pods to deploy on the same node.

  1. Wait until the CHK cluster is in a "Completed" state (kubectl get chk).
  2. Modify one key in the settings section. In my case, it was keeper_server/coordination_settings/force_sync: "false", which I changed from false to true.
  3. Observe what happens with the cluster.

If the cluster starts without any issues, repeat steps 3 to 4 again.

From my tests, I observed two outcomes:

  1. After the cluster was modified, the operator restarted all pods twice. It waited for the pods to be alive and then decided to perform a second round of restarts.
  2. After the cluster was modified, one or more StatefulSets were scaled down to 0 but never scaled back up. The CHK status remained Completed.

Could you perform such tests on you environment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants