-
Notifications
You must be signed in to change notification settings - Fork 19
Controller-manager restarting after machine reboot #39
Description
Considering a Kubernetes Cluster, after the machine rebooting, the Controller Manager (only) restarts repeatedly until a new deploy of the manager is done. The logs file shows a problem on the leader election, as seen by the logs fragment below:
E0530 12:18:29.937141 1 leaderelection.go:367] Failed to update lock: client rate limiter Wait returned an error: context deadline exceeded {"level":"debug","ts":"2023-05-30T12:18:30Z","logger":"events","msg":"controller-manager-664469c876-9t8xf_31ee84ab-896f-4835-af8f-52598c6b719f stopped leading","type":"Normal","object":{"kind":"Lease","namespace":"intel-power","name":"power-operator-6846766c","uid":"9526a048-3cb0-41a3-96a4-d83320938624","apiVersion":"coordination.k8s.io/v1","resourceVersion":"4690743"},"reason":"LeaderElection"} I0530 12:18:31.031630 1 leaderelection.go:283] failed to renew lease intel-power/power-operator-6846766c: timed out waiting for the condition {"level":"info","ts":"2023-05-30T12:18:34Z","msg":"Stopping and waiting for non leader election runnables"} {"level":"info","ts":"2023-05-30T12:18:35Z","msg":"Stopping and waiting for leader election runnables"} {"level":"info","ts":"2023-05-30T12:18:36Z","msg":"Shutdown signal received, waiting for all workers to finish","controller":"powerconfig","controllerGroup":"power.intel.com","controllerKind":"PowerConfig"} {"level":"error","ts":"2023-05-30T12:18:34Z","logger":"setup","msg":"problem running manager","error":"leader election lost","stacktrace":"main.main\n\t/workspace/main.go:98\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"}
To reproduce:
- Deploy Kubernetes Power Manager normally;
- Observe if all pods are running correctly;
- Reboot the machine using "sudo reboot";
- After the reboot, the controller-manager will present problems with the leader election;
- A new deployment of the controller-manager normalizes the operation.