Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controller): provision zfs volume if zfs volume already exists #576

Merged
merged 1 commit into from
Aug 12, 2024

Conversation

AChangFeng
Copy link
Contributor

@AChangFeng AChangFeng commented Aug 6, 2024

Why is this PR required? What issue does it fix?:

  • When the ZFSVolume is actually ready but the PV has not been created due to timeout or controller crash, the status of ZFSVolume will remain in Pending, which prevents the PV from being created and keeps the PVC in Pending as well.

What this PR does?:

  • Clear finalizers of ZFSVolume in zfs.ProvisionVolume()
  • When volume.ZVController.syncZV() is called, it will update the status of the ready ZFSVolume to the actual status instead of keep calling zfs.SetVolumeProp()

Does this PR require any upgrade changes?:
No

If the changes in this PR are manually verified, list down the scenarios covered::

  • This is an intermittent issue, and it is difficult to reproduce the scenarios manually

Any additional information for your reviewer? :
No

Checklist:

  • Fixes #
  • PR Title follows the convention of <type>(<scope>): <subject>
  • Has the change log section been updated?
  • Commit has unit tests
  • Commit has integration tests
  • (Optional) Are upgrade changes included in this PR? If not, mention the issue/PR to track:
  • (Optional) If documentation changes are required, which issue on https://github.com/openebs/website is used to track them:

@AChangFeng AChangFeng force-pushed the develop branch 5 times, most recently from 6781257 to b6d2974 Compare August 6, 2024 11:21
@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2024

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.37%. Comparing base (c389127) to head (bbf3cb6).
Report is 3 commits behind head on develop.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop     #576   +/-   ##
========================================
  Coverage    96.37%   96.37%           
========================================
  Files            1        1           
  Lines          496      496           
========================================
  Hits           478      478           
  Misses          14       14           
  Partials         4        4           
Flag Coverage Δ
bddtests 96.37% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Abhinandan-Purkait
Copy link
Member

@AChangFeng Can you give some more context on how updating the finalizers help here?

@AChangFeng
Copy link
Contributor Author

@AChangFeng Can you give some more context on how updating the finalizers help here?

  • If we don't reset finalizer, zfs.IsVolumeReady() always return true, this leading zfs.ZVController.syncZV() keep calling zfs.SetVolumeProp()
  • When we reset finalizer, zfs.IsVolumeReady() will return false, this leading zfs.ZVControlelr.syncZV() call zfs.UpdateZvolInfo()

@Abhinandan-Purkait
Copy link
Member

Abhinandan-Purkait commented Aug 7, 2024

@AChangFeng Can you give some more context on how updating the finalizers help here?

  • If we don't reset finalizer, zfs.IsVolumeReady() always return true, this leading zfs.ZVController.syncZV() keep calling zfs.SetVolumeProp()
  • When we reset finalizer, zfs.IsVolumeReady() will return false, this leading zfs.ZVControlelr.syncZV() call zfs.UpdateZvolInfo()

Isn't that bit of the code for older volumes that did not have the status? Shouldn't hit if the zfsvolume having a status is ready?

@AChangFeng
Copy link
Contributor Author

@AChangFeng Can you give some more context on how updating the finalizers help here?

  • If we don't reset finalizer, zfs.IsVolumeReady() always return true, this leading zfs.ZVController.syncZV() keep calling zfs.SetVolumeProp()
  • When we reset finalizer, zfs.IsVolumeReady() will return false, this leading zfs.ZVControlelr.syncZV() call zfs.UpdateZvolInfo()

Isn't that bit of the code for older volumes that did not have the status? Shouldn't hit if the zfsvolume having a status is ready?

Think about this corner case:

  1. zfs.ProvisionVolume() is called by controller to provision a zfs volume
  2. agent create a zfs volume then update status of ZFSVolume to Ready and append zfs.ZFSFinalizer to ZFSVolume
  3. agent create zfs volume too slow so that controller call zfs.ProvisionVolume again
  4. zfs.GetZFSVolume() call in zfs.ProvisionVolume return ZFSVolume and nil
  5. do not reset finalizers of ZFSVolume, but set status of ZFSVolume to pending
  6. agent will never update status of ZFSVolume to Ready or Failed because ZFSVolume already has zfs.ZFSFinalizer
  7. zfs.checkVolCreation() call in zfs.ProvisionVolume() keep return fmt.Errorf("zfs: context deadline reached")
  8. PV will never be provisioned

@Abhinandan-Purkait
Copy link
Member

Abhinandan-Purkait commented Aug 7, 2024

Think about this corner case:

  1. zfs.ProvisionVolume() is called by controller to provision a zfs volume
  2. agent create a zfs volume then update status of ZFSVolume to Ready and append zfs.ZFSFinalizer to ZFSVolume
  3. agent create zfs volume too slow so that controller call zfs.ProvisionVolume again
  4. zfs.GetZFSVolume() call in zfs.ProvisionVolume return ZFSVolume and nil
  5. do not reset finalizers of ZFSVolume, but set status of ZFSVolume to pending
  6. agent will never update status of ZFSVolume to Ready or Failed because ZFSVolume already has zfs.ZFSFinalizer
  7. zfs.checkVolCreation() call in zfs.ProvisionVolume() keep return fmt.Errorf("zfs: context deadline reached")
  8. PV will never be provisioned

Yes, I agree to the flow you suggested here. I was wondering rather than resetting the finalizer, what if we fix the IsVolumeReady method, to check finalizer if and only if the status is empty. If a status is a valid value and is Ready return true if not return false, we don't even go to the finalizer check section. Does this sound viable to you?

I believe the issue happens because we do the check using finalizer even if we have the status, which should have been just enough otherwise to figure out if the volume is ready.

@AChangFeng
Copy link
Contributor Author

Think about this corner case:

  1. zfs.ProvisionVolume() is called by controller to provision a zfs volume
  2. agent create a zfs volume then update status of ZFSVolume to Ready and append zfs.ZFSFinalizer to ZFSVolume
  3. agent create zfs volume too slow so that controller call zfs.ProvisionVolume again
  4. zfs.GetZFSVolume() call in zfs.ProvisionVolume return ZFSVolume and nil
  5. do not reset finalizers of ZFSVolume, but set status of ZFSVolume to pending
  6. agent will never update status of ZFSVolume to Ready or Failed because ZFSVolume already has zfs.ZFSFinalizer
  7. zfs.checkVolCreation() call in zfs.ProvisionVolume() keep return fmt.Errorf("zfs: context deadline reached")
  8. PV will never be provisioned

Yes, I agree to the flow you suggested here. I was wondering rather than resetting the finalizer, what if we fix the IsVolumeReady method, to check finalizer if and only if the status is empty. If a status is a valid value and is Ready return true if not return false, we don't even go to the finalizer check section. Does this sound viable to you?

I believe the issue happens because we do the check using finalizer even if we have the status, which should have been just enough otherwise to figure out if the volume is ready.

LGTM. I have fixed the IsVolumeReady method with unit tests covered.

pkg/zfs/volume.go Show resolved Hide resolved
Copy link
Member

@Abhinandan-Purkait Abhinandan-Purkait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Abhinandan-Purkait
Copy link
Member

@AChangFeng Are we good to merge?

@AChangFeng
Copy link
Contributor Author

@AChangFeng Are we good to merge?

Yes, everything looks good to me.

@Abhinandan-Purkait Abhinandan-Purkait merged commit 6c89f42 into openebs:develop Aug 12, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants