Description
Description
In a multi-node cluster, it's possible when deleting a devworkspace that uses the per-user/common PVC strategy for the PVC cleanup pod to be scheduled on a node that is different than the node where the PVC is mounted. Since PVCs are created as ReadWriteOnce, only a single node can mount the PVC and thus the cleanup pod will fail to start with a PVC mount error. This causes the devworkspace to remain in a terminating state indefinitely.
Since you cannot modify the node that a pod is scheduled on after the pod has been created, you need to delete the cleanup pod and have it automatically re-created until it is assigned to the node where the PVC is mounted in order for the workspace to be deleted.
What's odd is that we are already applying a node selector label to the cleanup pod. Perhaps there are cases where the namespace is missing the node selector annotation? CC: @musienko-maxim
How To Reproduce
Does not always occur, requires a multi-node cluster.
- Create a devworkspace using the per-user/common storage strategy
- Delete the devworkspace
- If the cleanup-workspace pod is scheduled on a different node than where the PVC is mounted, the pod will fail to be created and the devworkspace will remain in the terminating state
Expected behavior
The cleanup-workspace pod is scheduled on thesame node where the PVC is mounted and terminates successfully. The deworkspace gets terminated successfully.
Additional context
Encountered this while testing on @musienko-maxim 's OCP 4.15 test cluster.