You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
➜ ~ kbcli version
Kubernetes: v1.27.3-gke.100
KubeBlocks: 0.7.0-alpha.4
kbcli: 0.7.0-alpha.4
Due to issue #4959, the role of the cluster's only pod changed from leader to follower to none, and seems the backuppolicy also changed accordingly, but it does not change immediately when the role changed. which cause the backup will failed if do the operation during the gap period
The backuppolicy labelselector changed:
Backup during the gap period will fail
Warning ApplyResourcesFailed 9m45s cluster-controller error invoking binding mysql/leaveMember: rpc error: code = Unavailable desc = error reading from server: EOF
Normal BackupJobCreate 6m3s (x2 over 13m) cluster-controller Create backupJob/mysqltest2-mysql-scaling
Warning Unhealthy 92s (x4 over 6m32s) event-controller Pod mysqltest2-mysql-0: Readiness probe failed: error: health rpc failed: rpc error: code = Unknown desc = {"event":"Success","originalRole":"candidate","role":"candidate"}
Warning BackupFailed 92s (x18 over 5m59s) cluster-controller backup for horizontalScaling failed: can not find any pod to backup by labelsSelector
The text was updated successfully, but these errors were encountered:
ahjing99
changed the title
[BUG]Backup will fail when the backuppolicy label selector didn't cache up cluster role change
[BUG]Backup will fail when the backuppolicy label selector didn't catch up cluster role change
Sep 1, 2023
This is by design. The cluster role is not labeled, so the backup controller can not find the target pod to back up.
Once the cluster restores to normal and has the proper role label, backups can proceed normally. The current backup fault tolerance is insufficient - a retry mechanism needs to be added for failures going forward.
➜ ~ kbcli version
Kubernetes: v1.27.3-gke.100
KubeBlocks: 0.7.0-alpha.4
kbcli: 0.7.0-alpha.4
Due to issue #4959, the role of the cluster's only pod changed from leader to follower to none, and seems the backuppolicy also changed accordingly, but it does not change immediately when the role changed. which cause the backup will failed if do the operation during the gap period
The backuppolicy labelselector changed:
Backup during the gap period will fail
The text was updated successfully, but these errors were encountered: