Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): Fix race of csi controller calls on the same volume #588

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

Lucaber
Copy link

@Lucaber Lucaber commented Sep 17, 2024

Why is this PR required? What issue does it fix?:

When a CreateVolume requests takes longer then 10 seconds the request runs into a client-side timeout and gets retied by kubelet. While processing the new request, the old request is still being canceled on the server-side and the created ZFSVolume CR gets deleted.

1 2
CreateVolume call
ZFSVolume CR created
waiting for volume creation
client timeout + context cancel
CreateVolume call
ZFSVolume CR already exists
ZFSVolume CR deleted
Return ERR Return OK without creating a ZFSVolume
I0916 15:01:49.227721       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:01:58.853180       1 controller.go:289] zfs: trying volume creation kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node [i04-1-b-node-i041b-0-0]
I0916 15:01:59.104207       1 volume.go:139] zfs: waiting for volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 to be created on nodeid i04-1-b-node-i041b-0-0
I0916 15:02:00.105273       1 volume.go:170] zfs: volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 provisioning failed on nodeid i04-1-b-node-i041b-0-0 err: zfs: context deadline reached
I0916 15:02:00.235113       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:02:00.449153       1 volume.go:221] zfs: deleted the volume pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457
I0916 15:02:00.449299       1 controller.go:466] created the volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node i04-1-b-node-i041b-0-0
I0916 15:02:00.553778       1 grpc.go:81] GRPC response: {"volume":{"accessible_topology":[{"segments":{"openebs.io/nodeid":"i04-1-b-node-i041b-0-0"}}],"capacity_bytes":5368709120,"volume_context":{"openebs.io/cas-type":"localpv-zfs","openebs.io/poolname":"kluster-pool/i04-1-b-128k"},"volume_id":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457"}}

What this PR does?: Adds a per volume mutex to csi calls to prevent simultaneous requests

Does this PR require any upgrade changes?: No

If the changes in this PR are manually verified, list down the scenarios covered::

Any additional information for your reviewer? :
Mention if this PR is part of any design or a continuation of previous PRs

Checklist:

  • Fixes #
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/website is used to track them:

When a CreateVolume requests takes longer then 10 seconds the request
runs into a client-side timeout and gets retied by kubelet.
While processing the new request, the old request is still being canceled
on the server-side and the created ZFSVolume CR gets deleted.

1|2
CreateVolume|
ZFSVolume CR created|
waiting for volume creation|
client timeout + context cancel|
|CreateVolume
|ZFSVolume CR already exists
ZFSVolume CR deleted|
Return ERR|Return OK without creating a ZFSVolume

```
I0916 15:01:49.227721       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:01:58.853180       1 controller.go:289] zfs: trying volume creation kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node [i04-1-b-node-i041b-0-0]
I0916 15:01:59.104207       1 volume.go:139] zfs: waiting for volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 to be created on nodeid i04-1-b-node-i041b-0-0
I0916 15:02:00.105273       1 volume.go:170] zfs: volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 provisioning failed on nodeid i04-1-b-node-i041b-0-0 err: zfs: context deadline reached
I0916 15:02:00.235113       1 grpc.go:72] GRPC call: /csi.v1.Controller/CreateVolume requests {"accessibility_requirements":{"preferred":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}],"requisite":[{"segments":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"i04-1-b-node-i041b-0-0","kubernetes.io/os":"linux","m3.services/cluster":"i04-1-b","m3.services/worker-group":"i041b-0","node.systems.mittwald.cloud/placement-group":"i041b-0","openebs.io/nodeid":"i04-1-b-node-i041b-0-0","openebs.io/nodename":"i04-1-b-node-i041b-0-0"}}]},"capacity_range":{"required_bytes":5368709120},"name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","parameters":{"compression":"off","csi.storage.k8s.io/pv/name":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457","csi.storage.k8s.io/pvc/name":"test-pvc0","csi.storage.k8s.io/pvc/namespace":"default","dedup":"off","fstype":"zfs","poolname":"kluster-pool/i04-1-b-128k","recordsize":"128k","shared":"yes","thinprovision":"yes"},"volume_capabilities":[{"AccessType":{"Mount":{"fs_type":"zfs"}},"access_mode":{"mode":1}}]}
I0916 15:02:00.449153       1 volume.go:221] zfs: deleted the volume pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457
I0916 15:02:00.449299       1 controller.go:466] created the volume kluster-pool/i04-1-b-128k/pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457 on node i04-1-b-node-i041b-0-0
I0916 15:02:00.553778       1 grpc.go:81] GRPC response: {"volume":{"accessible_topology":[{"segments":{"openebs.io/nodeid":"i04-1-b-node-i041b-0-0"}}],"capacity_bytes":5368709120,"volume_context":{"openebs.io/cas-type":"localpv-zfs","openebs.io/poolname":"kluster-pool/i04-1-b-128k"},"volume_id":"pvc-eb11c446-f26c-4333-b3da-4ecb9c5b2457"}}
```

Signed-off-by: Luca Berneking <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant