Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes to support hyperdisk multi-writer mode #1901

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

karkunpavan
Copy link

@karkunpavan karkunpavan commented Jan 10, 2025

What type of PR is this?
/kind bug

What this PR does / why we need it:
Hyper disk multi-writer support is now in compute/v1 but the persistent disk csi driver still calls v0.beta which leads a error while creating a hyperdisk with multi-writer mode.

Error:

Warning ProvisioningFailed 31m (x15 over 57m) pd.csi.storage.gke.io_wduan-1030a-g-vdnqh-master-0_dc46e9f6-1725-4f15-922c-99f0e281e102 failed to provision volume with StorageClass
"balanced-storage": rpc error: code = InvalidArgument desc = CreateVolume failed: rpc error: code = InvalidArgument desc = CreateVolume failed to create single zonal disk pvc-0b65d680-636b-46c6-876d-d3a6c412c3ef: failed to insert zonal disk: unknown Insert disk error:
googleapi: Error 400: Invalid value for field 'resource.multiWriter': 'true'.
Cannot specify the multi writer field for 'hyperdisk-balanced' disks,
please use access mode instead., invalid%!v(MISSING)

To fix this here are the changes done:

  1. Add AccessMode param to DiskParameters struct
  2. Initialize the AccessMode field to the input value in ExtractAndDefaultParameters(). We do not have an explicit default - that will be decided by the PD layer depending on disk type.
  3. The parameter is read by controller in createVolumeInternal()
  4. The accessMode before calling createSingleZoneDisk() and createRegionalDisk() is currently set to "", the change fixes createsingledisk() to read from the parameter passed by the user.
  5. insertZonalDisk() currently calls beta APIs when multi-writer is set, the fix is to use v1 API instead.
  6. insertRegionalDisk() currently calls beta APIs when multi-writer is set, the fix is to use v1 API instead.
  7. Enable the multiwriter e2e tests to test creation and lifecycle of multi-writer disks.
  8. Delete stale comments about hyperdisk and multi-writer support

Which issue(s) this PR fixes:

Fixes #1863

Special notes for your reviewer:

  • This issue and the fix proposal has been discussed with Matthew and Hung

Does this PR introduce a user-facing change?:

This PR introduces READ_WRITE_MANY AccessMode also know as Multi-writer support for hyperdisks. Refer [here](https://cloud.google.com/compute/docs/disks/sharing-disks-between-vms#hd-multi-writer) for details about the multi-writer mode support, disk and machine types that support it.

Users can enable this mode by setting `access-mode: "READ_WRITE_MANY"` for the corresponding disk section to attach it in multi-writer mode.

@k8s-ci-robot
Copy link
Contributor

Adding the "do-not-merge/release-note-label-needed" label because no release-note block was detected, please follow our release note process to remove it.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 10, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: karkunpavan
Once this PR has been reviewed and has the lgtm label, please assign msau42 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 10, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @karkunpavan. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jan 10, 2025
Copy link

linux-foundation-easycla bot commented Jan 10, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@k8s-ci-robot k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Jan 10, 2025
@mattcary
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 13, 2025
* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jan 13, 2025
@karkunpavan
Copy link
Author

Update on other tests we ran manually: The test follows on the lines of the original test that reproduced this issue here: https://github.com/scanfield/mw-repro. I was able to reproduce the issue without the fix and that the issue went away with the fix. Pods can write and read simultaneously.

@hungnguyen243
Copy link
Contributor

Overall looks good to me. Minor suggestion: Could we also remove the GetMultiWriter()'s invocation in fake_gce.go (

) as well? I also think we can remove this method altogether as it's no longer accurate and will become disused after that.

* Removing alpha disk from tests.

* Changes setup the test to run with hyperdisk extreme.

* Updates the disk type back to balanced, as HDX doesn't support multi writer.

* Moving over to m1 megamem as thats the only type of machine that can support all needed disk types.

* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.

* Changes update the tests to use two contexts, one for multiwriter and one for the existing tests. This was deemed necessary as only some disks can support multi-writer, and only some VM shapes can support said disks.

* Fixing some git oddness

* Fixing some formatting.

* More formatting fixes.

* Hopefully last changes for formatting.

* Fixing linting issue.

* Cleaning up un-used / un-needed GetMultiWriter test function.
@hungnguyen243
Copy link
Contributor

@sjswerdlow Could you remove the whole conditional in fake_gce.go, instead of just !resp.GetMultiWriter()? I think that's why the last commit caused the test failure.

dfajmon pushed a commit to dfajmon/gcp-pd-csi-driver that referenced this pull request Jan 23, 2025
dfajmon pushed a commit to dfajmon/gcp-pd-csi-driver that referenced this pull request Jan 23, 2025
dfajmon pushed a commit to dfajmon/gcp-pd-csi-driver that referenced this pull request Jan 23, 2025
@hungnguyen243
Copy link
Contributor

/lgtm

@k8s-ci-robot
Copy link
Contributor

@hungnguyen243: changing LGTM is restricted to collaborators

In response to this:

/lgtm

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@mattcary
Copy link
Contributor

We need a release note for this -- supporting hyperdisk multiwriter is definitely user facing

@karkunpavan
Copy link
Author

Updated the release note section with user visible changes.

@mattcary
Copy link
Contributor

Updated the release note section with user visible changes.
👍🏾 thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. kind/bug Categorizes issue or PR as related to a bug. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
5 participants