Skip to content

Karpenter does not drain the node before sending shutdown signal #1894

@ShubhamKr11

Description

@ShubhamKr11

Description

Observed Behavior:

  • Karpenter is not draining the node before sending node shutdown signal to the kubelet.
  • Attaching kubelet logs for a node & karpenter logs related to the same node. Please note the timeline for both the logs.
  • Providing related logs by Kubelet:
2024-12-03 14:31:59.602	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.602338    2220 nodeshutdown_manager_linux.go:265] \"Shutdown manager detected new shutdown event, isNodeShuttingDownNow\" event=true"}
2024-12-03 14:31:59.602	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.602393    2220 nodeshutdown_manager_linux.go:322] \"Shutdown manager processing shutdown event\""}
2024-12-03 14:31:59.604	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.604475    2220 kubelet_node_status.go:669] \"Recording event message for node\" node=\"i-xxx\" event=\"NodeNotReady\""}
2024-12-03 14:31:59.604	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.604510    2220 setters.go:552] \"Node became not ready\" node=\"i-xxx\" condition={\"type\":\"Ready\",\"status\":\"False\",\"lastHeartbeatTime\":\"2024-12-03T09:01:59Z\",\"lastTransitionTime\":\"2024-12-03T09:01:59Z\",\"reason\":\"KubeletNotReady\",\"message\":\"node is shutting down\"}"}
2024-12-03 14:31:59.605	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.605119    2220 nodeshutdown_manager_linux.go:375] \"Shutdown manager killing pod with gracePeriod\" pod=\"kube-system/kube-proxy-i-xxx\" gracePeriod=20"}
2024-12-03 14:31:59.605	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.605294    2220 kuberuntime_container.go:745] \"Killing container with a grace period\" pod=\"kube-system/kube-proxy-i-xxx\" podUID=\"e39ab0aac325868d61054ba7f351a6fe\" containerName=\"kube-proxy\" containerID=\"containerd://3e44355e38045e3c954ec8b4f38d022c65c17ed5a21f181330b6ac6b55cc199f\" gracePeriod=20"}
2024-12-03 14:31:59.605	{"stime":"Dec  3 09:01:59","pid":"1933","message":"time=\"2024-12-03T09:01:59.605566402Z\" level=info msg=\"StopContainer for \\\"3e44355e38045e3c954ec8b4f38d022c65c17ed5a21f181330b6ac6b55cc199f\\\" with timeout 20 (s)\""}
2024-12-03 14:31:59.606	{"stime":"Dec  3 09:01:59","pid":"1933","message":"time=\"2024-12-03T09:01:59.605917479Z\" level=info msg=\"Stop container \\\"3e44355e38045e3c954ec8b4f38d022c65c17ed5a21f181330b6ac6b55cc199f\\\" with signal terminated\""}
2024-12-03 14:31:59.719	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.719789    2220 nodeshutdown_manager_linux.go:395] \"Shutdown manager finished killing pod\" pod=\"kube-system/kube-proxy-i-xxx\""}
------ similar logs for other pods ------ 
2024-12-03 14:31:59.720	{"stime":"Dec  3 09:01:59","pid":"2220","message":"I1203 09:01:59.719828    2220 nodeshutdown_manager_linux.go:375] \"Shutdown manager killing pod with gracePeriod\" pod=\"logging/fluent-bit-sgkn4\" gracePeriod=10"}
...
  • Providing related logs by Karpenter:
2024-12-03 14:29:40.306	{"host":"*.*.*.*","log":"{\"level\":\"DEBUG\",\"time\":\"2024-12-03T08:59:40.305Z\",\"logger\":\"controller\",\"caller\":\"disruption/controller.go:91\",\"message\":\"marking consolidatable\",\"commit\":\"6174c75\",\"controller\":\"nodeclaim.disruption\",\"controllerGroup\":\"karpenter.sh\",\"controllerKind\":\"NodeClaim\",\"NodeClaim\":{\"name\":\"karpenter-worker-nodes-1-xxx\"},\"namespace\":\"\",\"name\":\"karpenter-worker-nodes-1-xxx\",\"reconcileID\":\"3dc66943-d53f-431b-910d-28b3cdb48b46\"}","stime":"2024-12-03T08:59:40.305913753Z"}
2024-12-03 14:30:27.577	{"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:00:27.577Z\",\"logger\":\"controller\",\"caller\":\"disruption/controller.go:183\",\"message\":\"disrupting nodeclaim(s) via delete, terminating 1 nodes (3 pods) i-xxx/c6a.4xlarge/on-demand\",\"commit\":\"6174c75\",\"controller\":\"disruption\",\"namespace\":\"\",\"name\":\"\",\"reconcileID\":\"fe254e07-a4da-49a3-b57b-38ffa4f46f05\",\"command-id\":\"cdbcfbba-f6ba-46f4-b53a-baa5b9bbbb29\",\"reason\":\"underutilized\"}","stime":"2024-12-03T09:00:27.577206208Z"}
2024-12-03 14:30:27.699	{"host":"*.*.*.*","log":"{\"level\":\"DEBUG\",\"time\":\"2024-12-03T09:00:27.698Z\",\"logger\":\"controller\",\"caller\":\"singleton/controller.go:26\",\"message\":\"command succeeded\",\"commit\":\"6174c75\",\"controller\":\"disruption.queue\",\"namespace\":\"\",\"name\":\"\",\"reconcileID\":\"991b7173-caf1-447f-87bc-8ead2cb33fc4\",\"command-id\":\"cdbcfbba-f6ba-46f4-b53a-baa5b9bbbb29\"}","stime":"2024-12-03T09:00:27.699034416Z"}
2024-12-03 14:30:27.721	{"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:00:27.721Z\",\"logger\":\"controller\",\"caller\":\"termination/controller.go:105\",\"message\":\"tainted node\",\"commit\":\"6174c75\",\"controller\":\"node.termination\",\"controllerGroup\":\"\",\"controllerKind\":\"Node\",\"Node\":{\"name\":\"i-05aca8638c296692b\"},\"namespace\":\"\",\"name\":\"i-xxx\",\"reconcileID\":\"e1161f8b-9d53-4a72-ae5e-93ba019ce257\",\"taint.Key\":\"karpenter.sh/disrupted\",\"taint.Value\":\"\",\"taint.Effect\":\"NoSchedule\"}","stime":"2024-12-03T09:00:27.721565969Z"}
2024-12-03 14:32:40.974	{"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:02:40.974Z\",\"logger\":\"controller\",\"caller\":\"termination/controller.go:165\",\"message\":\"deleted node\",\"commit\":\"6174c75\",\"controller\":\"node.termination\",\"controllerGroup\":\"\",\"controllerKind\":\"Node\",\"Node\":{\"name\":\"i-05aca8638c296692b\"},\"namespace\":\"\",\"name\":\"i-xxx\",\"reconcileID\":\"4f9983ef-e372-4702-835e-fac3da09baff\"}","stime":"2024-12-03T09:02:40.974320106Z"}
2024-12-03 14:32:41.313	{"host":"*.*.*.*","log":"{\"level\":\"INFO\",\"time\":\"2024-12-03T09:02:41.312Z\",\"logger\":\"controller\",\"caller\":\"termination/controller.go:79\",\"message\":\"deleted nodeclaim\",\"commit\":\"6174c75\",\"controller\":\"nodeclaim.termination\",\"controllerGroup\":\"karpenter.sh\",\"controllerKind\":\"NodeClaim\",\"NodeClaim\":{\"name\":\"karpenter-worker-nodes-1-xxx\"},\"namespace\":\"\",\"name\":\"karpenter-worker-nodes-1-xxx\",\"reconcileID\":\"085b45ab-5d86-4150-b046-2526aaf9f5ab\",\"Node\":{\"name\":\"i-xxx\"},\"provider-id\":\"aws:///ap-south-1c/i-xxx\"}","stime":"2024-12-03T09:02:41.313115195Z"}

Expected Behavior:

  • As it is mentioned in the Karpenter doc, it should first cordon & drain the node & then only, node termination should be triggered.

Versions:

  • Chart Version: v1.0.6
  • Kubernetes Version (kubectl version): 1.28.12
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.kind/bugCategorizes issue or PR as related to a bug.triage/needs-informationIndicates an issue needs more information in order to work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions