volume not being reattached to healthy node when initial node shutdown #720

evgenii-avdiukhin · 2024-09-11T13:57:53Z

TL;DR

I have configure csi-driver and deployed jenkins statefullset to test
the volume was automatically created and attached to worker-1
jenkins pod then was scheduled on the same node
then i wanted to test how reattachment works
i shutdown worker-1 hetzner vm
but nothing happened, volume is not being reattached
since tolerations are configured, jenkins pod is terminating and then try to schedule on the node that has the pvc, but he cant because pvc is still on the dead node
what do i do wrong? or this behaviour is not supported by csi-driver?

Expected behavior

hetzne volume is moved to healthy node and pod schedule successfully

Observed behavior

volume is not being reattached

Minimal working example

No response

Log output

No response

Additional information

No response

mpepping · 2024-09-21T19:18:03Z

By design, StatefulSet pods do not get rescheduled to a new node when the original node becomes unavailable. This is because Kubernetes does not distinguish between a deliberate shutdown and a network partition, so it marks the pods on the down node as Unknown rather than deleting them. That is what you see when power-off/shutdown a node. It rewquires manual rescheduling in case of a StatefulSet.

However if you do a drain or delete of the node running the Jenkins pod, it all works as you may expect. The behavior is the most responsive when draining or deleting nodes. Some 'exclusively attached' events on the workload, but all in all the PVC re-attaches in a reasonable time:

Normal   Scheduled               22s   default-scheduler        Successfully assigned jenkins/jenkins-0 to dev-pool-small-static-worker2
Warning  FailedAttachVolume      23s   attachdetach-controller  Multi-Attach error for volume "pvc-8b1a23a1-cc85-4b09-9231-2c963885e366" Volume is already exclusively attached to one node and can't be attached to another 
Normal   SuccessfulAttachVolume  0s    attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-8b1a23a1-cc85-4b09-9231-2c963885e366"

evgenii-avdiukhin added the bug Something isn't working label Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

volume not being reattached to healthy node when initial node shutdown #720

volume not being reattached to healthy node when initial node shutdown #720

evgenii-avdiukhin commented Sep 11, 2024

mpepping commented Sep 21, 2024

volume not being reattached to healthy node when initial node shutdown #720

volume not being reattached to healthy node when initial node shutdown #720

Comments

evgenii-avdiukhin commented Sep 11, 2024

TL;DR

Expected behavior

Observed behavior

Minimal working example

Log output

Additional information

mpepping commented Sep 21, 2024