-
Notifications
You must be signed in to change notification settings - Fork 331
Open
Labels
help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.triage/needs-informationIndicates an issue needs more information in order to work on it.Indicates an issue needs more information in order to work on it.
Description
Description
Observed Behavior:
- Karpenter is not draining the node before sending node shutdown signal to the kubelet.
- Attaching kubelet logs for a node & karpenter logs related to the same node. Please note the timeline for both the logs.
- Providing related logs by Kubelet:
2024-12-03 14:31:59.602 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.602338 2220 nodeshutdown_manager_linux.go:265] \"Shutdown manager detected new shutdown event, isNodeShuttingDownNow\" event=true"}
2024-12-03 14:31:59.602 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.602393 2220 nodeshutdown_manager_linux.go:322] \"Shutdown manager processing shutdown event\""}
2024-12-03 14:31:59.604 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.604475 2220 kubelet_node_status.go:669] \"Recording event message for node\" node=\"i-xxx\" event=\"NodeNotReady\""}
2024-12-03 14:31:59.604 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.604510 2220 setters.go:552] \"Node became not ready\" node=\"i-xxx\" condition={\"type\":\"Ready\",\"status\":\"False\",\"lastHeartbeatTime\":\"2024-12-03T09:01:59Z\",\"lastTransitionTime\":\"2024-12-03T09:01:59Z\",\"reason\":\"KubeletNotReady\",\"message\":\"node is shutting down\"}"}
2024-12-03 14:31:59.605 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.605119 2220 nodeshutdown_manager_linux.go:375] \"Shutdown manager killing pod with gracePeriod\" pod=\"kube-system/kube-proxy-i-xxx\" gracePeriod=20"}
2024-12-03 14:31:59.605 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.605294 2220 kuberuntime_container.go:745] \"Killing container with a grace period\" pod=\"kube-system/kube-proxy-i-xxx\" podUID=\"e39ab0aac325868d61054ba7f351a6fe\" containerName=\"kube-proxy\" containerID=\"containerd://3e44355e38045e3c954ec8b4f38d022c65c17ed5a21f181330b6ac6b55cc199f\" gracePeriod=20"}
2024-12-03 14:31:59.605 {"stime":"Dec 3 09:01:59","pid":"1933","message":"time=\"2024-12-03T09:01:59.605566402Z\" level=info msg=\"StopContainer for \\\"3e44355e38045e3c954ec8b4f38d022c65c17ed5a21f181330b6ac6b55cc199f\\\" with timeout 20 (s)\""}
2024-12-03 14:31:59.606 {"stime":"Dec 3 09:01:59","pid":"1933","message":"time=\"2024-12-03T09:01:59.605917479Z\" level=info msg=\"Stop container \\\"3e44355e38045e3c954ec8b4f38d022c65c17ed5a21f181330b6ac6b55cc199f\\\" with signal terminated\""}
2024-12-03 14:31:59.719 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.719789 2220 nodeshutdown_manager_linux.go:395] \"Shutdown manager finished killing pod\" pod=\"kube-system/kube-proxy-i-xxx\""}
------ similar logs for other pods ------
2024-12-03 14:31:59.720 {"stime":"Dec 3 09:01:59","pid":"2220","message":"I1203 09:01:59.719828 2220 nodeshutdown_manager_linux.go:375] \"Shutdown manager killing pod with gracePeriod\" pod=\"logging/fluent-bit-sgkn4\" gracePeriod=10"}
...
- Providing related logs by Karpenter:
2024-12-03 14:29:40.306 {"host":"*.*.*.*","log":"{\"level\":\"DEBUG\",\"time\":\"2024-12-03T08:59:40.305Z\",\"logger\":\"controller\",\"caller\":\"disruption/controller.go:91\",\"message\":\"marking consolidatable\",\"commit\":\"6174c75\",\"controller\":\"nodeclaim.disruption\",\"controllerGroup\":\"karpenter.sh\",\"controllerKind\":\"NodeClaim\",\"NodeClaim\":{\"name\":\"karpenter-worker-nodes-1-xxx\"},\"namespace\":\"\",\"name\":\"karpenter-worker-nodes-1-xxx\",\"reconcileID\":\"3dc66943-d53f-431b-910d-28b3cdb48b46\"}","stime":"2024-12-03T08:59:40.305913753Z"}
2024-12-03 14:30:27.577 {"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:00:27.577Z\",\"logger\":\"controller\",\"caller\":\"disruption/controller.go:183\",\"message\":\"disrupting nodeclaim(s) via delete, terminating 1 nodes (3 pods) i-xxx/c6a.4xlarge/on-demand\",\"commit\":\"6174c75\",\"controller\":\"disruption\",\"namespace\":\"\",\"name\":\"\",\"reconcileID\":\"fe254e07-a4da-49a3-b57b-38ffa4f46f05\",\"command-id\":\"cdbcfbba-f6ba-46f4-b53a-baa5b9bbbb29\",\"reason\":\"underutilized\"}","stime":"2024-12-03T09:00:27.577206208Z"}
2024-12-03 14:30:27.699 {"host":"*.*.*.*","log":"{\"level\":\"DEBUG\",\"time\":\"2024-12-03T09:00:27.698Z\",\"logger\":\"controller\",\"caller\":\"singleton/controller.go:26\",\"message\":\"command succeeded\",\"commit\":\"6174c75\",\"controller\":\"disruption.queue\",\"namespace\":\"\",\"name\":\"\",\"reconcileID\":\"991b7173-caf1-447f-87bc-8ead2cb33fc4\",\"command-id\":\"cdbcfbba-f6ba-46f4-b53a-baa5b9bbbb29\"}","stime":"2024-12-03T09:00:27.699034416Z"}
2024-12-03 14:30:27.721 {"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:00:27.721Z\",\"logger\":\"controller\",\"caller\":\"termination/controller.go:105\",\"message\":\"tainted node\",\"commit\":\"6174c75\",\"controller\":\"node.termination\",\"controllerGroup\":\"\",\"controllerKind\":\"Node\",\"Node\":{\"name\":\"i-05aca8638c296692b\"},\"namespace\":\"\",\"name\":\"i-xxx\",\"reconcileID\":\"e1161f8b-9d53-4a72-ae5e-93ba019ce257\",\"taint.Key\":\"karpenter.sh/disrupted\",\"taint.Value\":\"\",\"taint.Effect\":\"NoSchedule\"}","stime":"2024-12-03T09:00:27.721565969Z"}
2024-12-03 14:32:40.974 {"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:02:40.974Z\",\"logger\":\"controller\",\"caller\":\"termination/controller.go:165\",\"message\":\"deleted node\",\"commit\":\"6174c75\",\"controller\":\"node.termination\",\"controllerGroup\":\"\",\"controllerKind\":\"Node\",\"Node\":{\"name\":\"i-05aca8638c296692b\"},\"namespace\":\"\",\"name\":\"i-xxx\",\"reconcileID\":\"4f9983ef-e372-4702-835e-fac3da09baff\"}","stime":"2024-12-03T09:02:40.974320106Z"}
2024-12-03 14:32:41.313 {"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:02:41.312Z\",\"logger\":\"controller\",\"caller\":\"termination/controller.go:79\",\"message\":\"deleted nodeclaim\",\"commit\":\"6174c75\",\"controller\":\"nodeclaim.termination\",\"controllerGroup\":\"karpenter.sh\",\"controllerKind\":\"NodeClaim\",\"NodeClaim\":{\"name\":\"karpenter-worker-nodes-1-xxx\"},\"namespace\":\"\",\"name\":\"karpenter-worker-nodes-1-xxx\",\"reconcileID\":\"085b45ab-5d86-4150-b046-2526aaf9f5ab\",\"Node\":{\"name\":\"i-xxx\"},\"provider-id\":\"aws:///ap-south-1c/i-xxx\"}","stime":"2024-12-03T09:02:41.313115195Z"}
Expected Behavior:
- As it is mentioned in the Karpenter doc, it should first cordon & drain the node & then only, node termination should be triggered.
Versions:
- Chart Version: v1.0.6
- Kubernetes Version (
kubectl version
): 1.28.12
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Metadata
Metadata
Assignees
Labels
help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.triage/needs-informationIndicates an issue needs more information in order to work on it.Indicates an issue needs more information in order to work on it.