Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance replicate delete policy #6247

Merged
merged 1 commit into from
Feb 28, 2025
Merged

Conversation

gerrod3
Copy link
Contributor

@gerrod3 gerrod3 commented Jan 30, 2025

Added a new field delete_policy that will determine how replicate will clean up the extra/leftover objects present in the downstream Pulp. The field is set to the default all which is the current behavior of deleting everything. The other two options are none and labeled. none is self-explanatory. Replicate now also labels each object created/updated on a replicate with the label UpstreamPulp:{upstream.pk}. This new label is what is used by the labeled policy to determine which objects should be deleted. Only objects with this label and are not present in the Upstream will be deleted under this policy, everything else will be left.

Also, added copying the Upstream objects' labels onto the downstream. This change isn't strictly necessary for the new delete feature, so I can remove it, add a new field for customization, or move it into another PR if wanted.

fixes: #5214

@gerrod3 gerrod3 force-pushed the repli-delete-policy branch from 7f1ca13 to 7ad9a18 Compare January 30, 2025 20:20
@mdellweg
Copy link
Member

Please take the following as a pure observation, something to be aware of:
This is (or in case of acceptance would be) the first time a Pulp feature depends on labels stored on it's own entities.
Yes, replicate has relied on remote labels before. But that's a different thing. Labels have been there to bridge gaps for integrators and are in full control of the user.
I can not state this is good or bad it just feels a bit odd.

NONE = "none"
DELETE_CHOICES = (
(ALL, "Delete all local objects not present on upstream."),
(LABELED, "Delete local labeled objects not present on upstream."),
Copy link
Member

@mdellweg mdellweg Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should happen if you have a local repository without a label. Should replicate replace that, leave it alone or simply fail?
I guess this question extends the policy beyond just "deleting".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that my description isn't that great. The 'local labeled objects' refers to objects that have been labeled by replicate with UpstreamPulp:{id}. So any repository without this label means it wasn't created(tracked) by replicate and thus will not be removed under this policy. Say there is a unlabeled repository with the same name as an upstream one (a case our users will be in on their first replicate if this is merged), then that repository will get the new label and then be tracked for all future replicates.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will we set the labels anyway? Then users could run one more "old style" replication to label all the affected ones and change the flag then.
Now we could choose to "protect" manually created repositories from the replicators.

@mdellweg mdellweg mentioned this pull request Feb 7, 2025
@gerrod3 gerrod3 force-pushed the repli-delete-policy branch 6 times, most recently from 6875981 to 2d7ed9b Compare February 18, 2025 20:00
@gerrod3
Copy link
Contributor Author

gerrod3 commented Feb 18, 2025

Ok I updated the PR to change the field to just policy. It does almost the exact same behavior, but now it also doesn't update existing local objects during the LABELED policy if there is a name collision and the object does not have a matching UpstreamPulp label. Also, renamed the NONE policy to NODELETE.

Copy link
Member

@mdellweg mdellweg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damn, this review was pending for some days...

Comment on lines +104 to +106
upstream_labels = getattr(upstream_object, "pulp_labels", {})
upstream_labels["UpstreamPulp"] = str(self.server.pk)
return upstream_labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if we replicate from another replica, we overwrite the "UpstreamPulp" label.
I think this is exactly what we should do. ;)

Comment on lines +83 to +86
if self.server.policy == UpstreamPulp.LABELED:
if object.pulp_labels.get("UpstreamPulp") != str(self.server.pk):
return False
return True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means in the "NODELETE" policy, we still allow local repositories to be overwritten, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, would you like it to also be protected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if that would make a forth policy. We can also just document that and maybe add the forth policy later.

@@ -91,6 +91,13 @@ class UpstreamPulpSerializer(ModelSerializer, HiddenFieldsMixin):
read_only=True,
)

policy = serializers.ChoiceField(
choices=UpstreamPulp.POLICY_CHOICES,
default="all",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this would be more consistent...

Suggested change
default="all",
default=UpstreamPulp.ALL,

But actually defaults on serializers interfere badly with partial_update.
We should really do:

Suggested change
default="all",
required=False,

and let the model handle the default.

@gerrod3 gerrod3 force-pushed the repli-delete-policy branch from 2d7ed9b to 49dc219 Compare February 26, 2025 04:36
mdellweg
mdellweg previously approved these changes Feb 26, 2025
@mdellweg
Copy link
Member

There's a migration conflict needing this to rebase.

@ggainey ggainey merged commit 210ad53 into pulp:main Feb 28, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance replicate feature
3 participants