Skip to content

Conversation

@mdbooth
Copy link
Contributor

@mdbooth mdbooth commented Nov 4, 2025

No description provided.

@openshift-ci-robot
Copy link

Pipeline controller notification
This repository is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger optional jobs most likely to be impacted by the proposed changes.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 4, 2025
@openshift-ci-robot
Copy link

openshift-ci-robot commented Nov 4, 2025

@mdbooth: This pull request references OCPCLOUD-3214 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set.

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

Hello @mdbooth! Some important instructions when contributing to openshift/api:
API design plays an important part in the user experience of OpenShift and as such API PRs are subject to a high level of scrutiny to ensure they follow our best practices. If you haven't already done so, please review the OpenShift API Conventions and ensure that your proposed changes are compliant. Following these conventions will help expedite the api review process for your PR.

@openshift-ci openshift-ci bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Nov 4, 2025
@mdbooth mdbooth marked this pull request as draft November 4, 2025 16:40
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 4, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign everettraven for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mdbooth mdbooth marked this pull request as ready for review November 4, 2025 16:44
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2025
// The maximum number of unmanagedCustomResourceDefinitions is 128.
//
// +optional
// +listType=set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: is this resource created by each actor or will there be one instance with co-ownership from different actors?

Example 1:

  • 1 ClusterAPI object created by HyperShift
  • 1 ClusterAPI object created by ACM

Example 2:

  • 1 ClusterAPI object, some entries in .spec.unmanagedCustomResourceDefinitions are from HyperShift, others are from ACM.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't have duplicates in listType=set, so multiple actors would own the same entry if that happened.

Each actor simply needs to submit an SSA transaction with their own requirements without reference to any other actors.

}

// ClusterAPISpec defines the desired configuration of the capi-operator.
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.unmanagedCustomResourceDefinitions) || has(self.unmanagedCustomResourceDefinitions)",message="unmanagedCustomResourceDefinitions cannot be unset once set"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We validate removal to not happen, do we also validate deletion of the object to not happen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting 🤔

I'm not convinced we can validate this here. I suspect we don't want to validate this at all, though: if the user is deleting ClusterOperators then they're already having a bad day. They've chosen this course of action and they're going to do it regardless. Whatever barriers we raise are just another in a succession of irritations that will be circumvented: disabling CVO, deleting VAPs, webhooks, etc. We can discourage the use of footguns, but ultimately we can't uninvent them.

Comment on lines 133 to 135
# TODO: Cannot test the XValidation rule "unmanagedCustomResourceDefinitions cannot be unset once set"
# because MinProperties=1 on ClusterAPISpec prevents creating spec: {} in the first place.
# The validation is still enforced at the API level via the XValidation rule.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this work? (I tried it)

Suggested change
# TODO: Cannot test the XValidation rule "unmanagedCustomResourceDefinitions cannot be unset once set"
# because MinProperties=1 on ClusterAPISpec prevents creating spec: {} in the first place.
# The validation is still enforced at the API level via the XValidation rule.
- name: Should not allow unsetting unmanagedCustomResourceDefinitions
initial: |
apiVersion: operator.openshift.io/v1alpha1
kind: ClusterAPI
metadata:
name: cluster
spec:
unmanagedCustomResourceDefinitions:
- clusters.cluster.x-k8s.io
- machines.cluster.x-k8s.io
updated: |
apiVersion: operator.openshift.io/v1alpha1
kind: ClusterAPI
metadata:
name: cluster
spec:
unmanagedCustomResourceDefinitions: null
expectedError: "unmanagedCustomResourceDefinitions cannot be unset once set"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think set to null is the same as unset.

However, I decided to go a different direction with this: I made spec optional. I think this should be the default for a ClusterOperator, tbh.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 10, 2025

@mdbooth: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Contributor

@yuqi-zhang yuqi-zhang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General structure looks fine, some questions inline

Also, based on Claude suggestion, all optional fields (spec, status, UnmanagedCustomResourceDefinitions, TargetConfigMaps, ActiveConfigMaps) should have a godoc line explaining what happens if the value is empty. (example: When omitted, the capi-operator will use default configuration with all CRDs managed by the operator. in the spec, assuming we keep spec optional)

//
// Compatibility level 4: No compatibility is provided, the API can change at any point for any reason. These capabilities should not be used by applications needing long term support.
// +openshift:compatibility-gen:level=4
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.spec) || !has(oldSelf.spec.unmanagedCustomResourceDefinitions) || has(self.spec.unmanagedCustomResourceDefinitions)",message="unmanagedCustomResourceDefinitions cannot be unset once set"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for validations, it's best to have it explicitly in the godoc as well if possible (so in this case, maybe just unmanagedCustomResourceDefinitions cannot be unset once set as a godoc line is fine?).

Side note, I thought that if we don't edit unmanagedCustomResourceDefinitions but instead delete the whole spec field after setting it at some point, self.spec.unmanagedCustomResourceDefinitions would maybe not evaluate correctly, but it seems that CEL can handle that null exception (and you have a test case above covering it)

// Compatibility level 4: No compatibility is provided, the API can change at any point for any reason. These capabilities should not be used by applications needing long term support.
// +openshift:compatibility-gen:level=4
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.spec) || !has(oldSelf.spec.unmanagedCustomResourceDefinitions) || has(self.spec.unmanagedCustomResourceDefinitions)",message="unmanagedCustomResourceDefinitions cannot be unset once set"
type ClusterAPI struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, like the other operator types, I assume this will be a singleton. Interestingly enough the only one to enforce it is the olm operator: https://github.com/openshift/api/blob/master/operator/v1alpha1/types_olm.go#L25

I think generally speaking it's a good pattern to have, so probably worth adding that? Unless I'm misunderstanding how this can be configured.


// spec is the specification of the desired behavior of the capi-operator.
// +optional
Spec *ClusterAPISpec `json:"spec,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on updated guidance, I think now we only pointers if there's a difference between the unset and empty struct, otherwise having it same as the status field below (not a pointer, omitzero) is the best option.

In this case, since "" and empty should both mean "no CRDs are unmanaged", I think this falls under that umbrella?

The other way we could maybe do this (and I don't know what is "better" conventions-wise) is to instead make this required, keep unmanagedCustomResourceDefinitions optional, move the // +kubebuilder:validation:XValidation:rule="!has(oldSelf.spec) || !has(oldSelf.spec.unmanagedCustomResourceDefinitions) || has(self.spec.unmanagedCustomResourceDefinitions)",message="unmanagedCustomResourceDefinitions cannot be unset once set" validation here to the spec, and remove the need for nil checking generally. This might be more future proof in case we want to add more spec fields in the future.


// ClusterAPISpec defines the desired configuration of the capi-operator.
type ClusterAPISpec struct {
// unmanagedCustomResourceDefinitions is a list of ClusterResourceDefinition (CRD)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably, this is only a set of CRDs that CAPI operator would use, so e.g. if I set "MachineConfigs" here, nothing would happen since CAPI doesn't manage MachineConfigs, nor can it tell CVO/MCO to not deploy/use some other api version.

Would there be a way for the user to discover what CRDs is able to be unmanaged here? Is that a big list or something we can add to the godoc/validator?

Reading this and openshift/enhancements#1845 I had initially thought that this is a generic list you can apply to any CRD but that shouldn't be the case right?

}

// ClusterAPIStatus describes the current state of the capi-operator.
// +kubebuilder:validation:MinProperties=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: I assume only CAPI-operator would be updating the status of this object. Does the validation here help prevent a specific scenario?

(should be fine to have, just curious why this validation was added)

// ClusterAPIStatus describes the current state of the capi-operator.
// +kubebuilder:validation:MinProperties=1
type ClusterAPIStatus struct {
// targetConfigMaps is a list of ConfigMap names that the staging controller
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing some context but I'm not 100% sure what these two configmap status field entails. I assume these are here for an admin to see? Are these reflecting the status of some other action the admin took elsewhere? Or something that has to do with the CRDs they just defined in the spec? (Just from an ELI5 perspective the status fields are confusing, if there's some other way to add some more context I think that would be helpful)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants