Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CSI] Trigger StopEndpoint if StartEndpoint has failed with GRPC Timeout error #2801

Open
antonmyagkov opened this issue Jan 6, 2025 · 0 comments

Comments

@antonmyagkov
Copy link
Collaborator

All Start/StopEndpoint requests land to the same queue: nbs/cloud/blockstore/libs/storage/service/volume_session_actor_mount.cpp at main · ydb-platform/nbs . NBS handles only one request at the same time.

StartEndpoint request from csi driver can fail by timeout however request in the queue will retry to start endpoint longer time. It leads to hanging delete volume operation as csi driver doesn’t send stop endpoint request(NodeVolume publish fails so we don’t need to send NodeUnpublishVolume).

Solution:
Send StopEndpoint request(in NodePublishVolume/NodeStageVolume) if StartEndpoint request fails with GRPC timeout error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant