Skip to content

Conversation

@abhilashshetty04
Copy link
Member

@abhilashshetty04 abhilashshetty04 commented Sep 3, 2025

When a Kubernetes node is removed from the cluster while it still has existing PVCs/LVMVolumes, those volumes become orphaned. If a PVC is deleted in this state, the LocalPV-LVM CSI controller marks the corresponding LVMVolume CR for deletion by adding a deletion timestamp.

Normally, the node-agent running on the worker node is responsible for cleaning up the logical volume (LV) and removing the finalizer from the CR. However, in this scenario, the node no longer exists, so there is no agent running to perform the cleanup.

This fix leverages the condition of the Kubernetes Node being absent: in such cases, the CSI controller itself removes the finalizer and cleans up the LVMVolume CR.

Fixes: #407

@abhilashshetty04 abhilashshetty04 requested a review from a team as a code owner September 3, 2025 08:43
@abhilashshetty04 abhilashshetty04 changed the title make delete call idempotent when node is removed from cluster make delete volume idempotent when node is removed from cluster Sep 3, 2025
@abhilashshetty04 abhilashshetty04 force-pushed the lvmvolume_leak_fix branch 3 times, most recently from 1671fcc to 9b11bec Compare September 5, 2025 06:55
Copy link
Member

@Abhinandan-Purkait Abhinandan-Purkait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

err = volbuilder.NewKubeclient().WithNamespace(LvmNamespace).Delete(volumeID)
if err == nil {
klog.Infof("deprovisioned volume %s", volumeID)
klog.Infof("deprovisioning volume %s, marking deletion timestamp", volumeID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we mean by marking deletion timestamp, here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it means the deletion timestamp is being set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we call delete which just marks deletion timestamp as we have finaliser.

Shall i remove marking deletion timestamp ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep that bit out since it's implied, but fine either way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove it.

Copy link
Member

@tiagolobocastro tiagolobocastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about existing volumes

}
present := isNodePresent(vol.Spec.OwnerNodeID)
if !present {
klog.Infof("Removing finalizer as node %s is not present in cluster", vol.Spec.OwnerNodeID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a warning?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will make it warning


_, err := volbuilder.NewKubeclient().WithNamespace(LvmNamespace).Update(vol)
if err != nil {
klog.Infof("Finalizer removed successfully for %s", vol.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this as info or is debug fine?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will not reach this code if node is present. Would be better to have it in log by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have the message from above where we call it if the node is not present?
Oh also this check seems reversed, if err != nil then we didn't remove the finalizer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, err != nil means we fail to remove finalizer, we return err. If not, we move on to check if lvmvolume CR is deleted.

err = volbuilder.NewKubeclient().WithNamespace(LvmNamespace).Delete(volumeID)
if err == nil {
klog.Infof("deprovisioned volume %s", volumeID)
klog.Infof("deprovisioning volume %s, marking deletion timestamp", volumeID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it means the deletion timestamp is being set?

"failed to handle delete volume request for {%s}", volumeID)
}
}
present := isNodePresent(vol.Spec.OwnerNodeID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the finalizer is already removed? Then we don't need to run this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

finalizer is removed from the node-agent when lvremove succeeds. Here we will validate if node is not present. If not, finalizer will not be removed. so will remove finalizer from here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lvm volume has additional finalisers (eg: added by user) but we've already removed ours, on a previous csi call, then we don't need to do this again
We only need to do this stage if our finaliser is present

Copy link
Member Author

@abhilashshetty04 abhilashshetty04 Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now this removes all Finalizers. I am using the function which is called by node-agent once lvremove is done.

https://github.com/openebs/lvm-localpv/blob/develop/pkg/lvm/volume.go#L217C1-L223C2

lvmvolume CR is managed by localpv-lvm. Users may not add any finalizers onto it.


_, err := volbuilder.NewKubeclient().WithNamespace(LvmNamespace).Update(vol)
if err != nil {
klog.Infof("Finalizer removed successfully for %s", vol.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we already have the message from above where we call it if the node is not present?
Oh also this check seems reversed, if err != nil then we didn't remove the finalizer?

err = volbuilder.NewKubeclient().WithNamespace(LvmNamespace).Delete(volumeID)
if err == nil {
klog.Infof("deprovisioned volume %s", volumeID)
klog.Infof("deprovisioning volume %s, marking deletion timestamp", volumeID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep that bit out since it's implied, but fine either way

"failed to handle delete volume request for {%s}", volumeID)
}
}
present := isNodePresent(vol.Spec.OwnerNodeID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lvm volume has additional finalisers (eg: added by user) but we've already removed ours, on a previous csi call, then we don't need to do this again
We only need to do this stage if our finaliser is present

if !present {
klog.Infof("Removing finalizer as node %s is not present in cluster", vol.Spec.OwnerNodeID)
if err = lvm.RemoveVolFinalizer(vol); err != nil {
return nil
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we return err here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail the DeleteVolume call? I guess that will be better.

@abhilashshetty04 abhilashshetty04 force-pushed the lvmvolume_leak_fix branch 3 times, most recently from da04e2d to d766fd7 Compare November 10, 2025 14:22
…e volume is not present

Signed-off-by: Abhilash Shetty <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Slow leak of LVMVolume Kubernetes objects causes extremely slow allocation of new volumes

4 participants