[CSI] trigger StopEndpoint if StartEndpoint has failed with GRPC Timeout error #2802

antonmyagkov · 2025-01-06T12:42:02Z

All Start/StopEndpoint requests are landed into the same queue: nbs/cloud/blockstore/libs/storage/service/volume_session_actor_mount.cpp at main · ydb-platform/nbs . NBS handles only one request at the same time.

StartEndpoint request from csi driver can fail by timeout however request in the queue will retry to start endpoint longer time. It leads to hanging delete volume operation as csi driver doesn’t send stop endpoint request(NodeVolume publish fails so we don’t need to send NodeUnpublishVolume).

Solution:
Send StopEndpoint request(in NodePublishVolume/NodeStageVolume) if StartEndpoint request fails with GRPC timeout error.

…C Timeout error

github-actions · 2025-01-06T13:22:15Z

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 0f788d5.

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
3560	3560	0	0	0	0

github-actions · 2025-01-07T17:42:49Z

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 08add02.

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
3562	3562	0	0	0	0

github-actions · 2025-01-12T17:13:15Z

Note

This is an automated comment that will be appended during run.

🟢 linux-x86_64-relwithdebinfo: all tests PASSED for commit 08add02.

TESTS	PASSED	ERRORS	FAILED	SKIPPED	MUTED^?
3571	3571	0	0	0	0

issue-2801: trigger StopEndpoint if StartEndpoint has failed with GRP…

0f788d5

…C Timeout error

antonmyagkov added the blockstore Add this label to run only cloud/blockstore build and tests on PR label Jan 6, 2025

add tests

08add02

antonmyagkov force-pushed the users/myagkov/issue-2801 branch from a46ee7f to 08add02 Compare January 7, 2025 17:06

antonmyagkov requested a review from tpashkin January 7, 2025 17:59

antonmyagkov added the rebase Add this label if you want to rebase your PR for test run label Jan 12, 2025

tpashkin approved these changes Jan 13, 2025

View reviewed changes

drbasic approved these changes Jan 13, 2025

View reviewed changes

antonmyagkov merged commit 48a64e3 into main Jan 13, 2025
21 of 22 checks passed

antonmyagkov deleted the users/myagkov/issue-2801 branch January 13, 2025 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CSI] trigger StopEndpoint if StartEndpoint has failed with GRPC Timeout error #2802

[CSI] trigger StopEndpoint if StartEndpoint has failed with GRPC Timeout error #2802

antonmyagkov commented Jan 6, 2025 •

edited

Loading

github-actions bot commented Jan 6, 2025

github-actions bot commented Jan 7, 2025

github-actions bot commented Jan 12, 2025

[CSI] trigger StopEndpoint if StartEndpoint has failed with GRPC Timeout error #2802

[CSI] trigger StopEndpoint if StartEndpoint has failed with GRPC Timeout error #2802

Conversation

antonmyagkov commented Jan 6, 2025 • edited Loading

github-actions bot commented Jan 6, 2025

github-actions bot commented Jan 7, 2025

github-actions bot commented Jan 12, 2025

antonmyagkov commented Jan 6, 2025 •

edited

Loading