You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a multi-node cluster, it's possible when deleting a devworkspace that uses the per-user/common PVC strategy for the PVC cleanup pod to be scheduled on a node that is different than the node where the PVC is mounted. Since PVCs are created as ReadWriteOnce, only a single node can mount the PVC and thus the cleanup pod will fail to start with a PVC mount error. This causes the devworkspace to remain in a terminating state indefinitely.
Since you cannot modify the node that a pod is scheduled on after the pod has been created, you need to delete the cleanup pod and have it automatically re-created until it is assigned to the node where the PVC is mounted in order for the workspace to be deleted.
Does not always occur, requires a multi-node cluster.
Create a devworkspace using the per-user/common storage strategy
Delete the devworkspace
If the cleanup-workspace pod is scheduled on a different node than where the PVC is mounted, the pod will fail to be created and the devworkspace will remain in the terminating state
Expected behavior
The cleanup-workspace pod is scheduled on thesame node where the PVC is mounted and terminates successfully. The deworkspace gets terminated successfully.
Additional context
Encountered this while testing on @musienko-maxim 's OCP 4.15 test cluster.
The text was updated successfully, but these errors were encountered:
It seems like this annotation is supposed to be applied from Che, as configured from this Che Cluster CR field.
However, if multiple nodes are selected, it's possible that the cleanup job may be assigned to a different node than the node where the PVC is mounted.
Description
In a multi-node cluster, it's possible when deleting a devworkspace that uses the per-user/common PVC strategy for the PVC cleanup pod to be scheduled on a node that is different than the node where the PVC is mounted. Since PVCs are created as ReadWriteOnce, only a single node can mount the PVC and thus the cleanup pod will fail to start with a PVC mount error. This causes the devworkspace to remain in a terminating state indefinitely.
Since you cannot modify the node that a pod is scheduled on after the pod has been created, you need to delete the cleanup pod and have it automatically re-created until it is assigned to the node where the PVC is mounted in order for the workspace to be deleted.
What's odd is that we are already applying a node selector label to the cleanup pod. Perhaps there are cases where the namespace is missing the node selector annotation? CC: @musienko-maxim
How To Reproduce
Does not always occur, requires a multi-node cluster.
Expected behavior
The cleanup-workspace pod is scheduled on thesame node where the PVC is mounted and terminates successfully. The deworkspace gets terminated successfully.
Additional context
Encountered this while testing on @musienko-maxim 's OCP 4.15 test cluster.
The text was updated successfully, but these errors were encountered: