Skip to content

Conversation

Techassi
Copy link
Member

Part of stackabletech/secret-operator#634 and https://github.com/stackabletech/decisions/issues/62.

This PR adds a new CRD maintainer which reconciles the CRDs when the conversion webhook server certificate is rotated. It additionally provides a oneshot channel to indicate that the initial CRD rollout is done and default custom resources can be applied.

This is a crude first implementation of the CRD and default CR maintenance mechanism. This PR will be used as a test-balloon in stackabletech/secret-operator#634.

In the future, this mechanism should be implemented in a more generic way, similar to kube::Controller.

@Techassi Techassi self-assigned this Sep 23, 2025
@Techassi Techassi moved this to Development: In Progress in Stackable Engineering Sep 23, 2025
@Techassi Techassi changed the title feat: Add CRD maintainer feat!: Add CRD maintainer Sep 23, 2025
@Techassi Techassi marked this pull request as ready for review September 24, 2025 09:08
@Techassi Techassi marked this pull request as draft September 24, 2025 09:36
@Techassi Techassi force-pushed the feat/crd-maintenance branch from 6c305d8 to 20656a6 Compare September 25, 2025 10:12
@Techassi Techassi force-pushed the feat/crd-maintenance branch from 20656a6 to 2fc5c28 Compare September 25, 2025 10:14
@Techassi Techassi marked this pull request as ready for review September 25, 2025 10:18
@Techassi Techassi moved this from Development: In Progress to Development: Waiting for Review in Stackable Engineering Sep 25, 2025
@sbernauer sbernauer moved this from Development: Waiting for Review to Development: In Review in Stackable Engineering Sep 30, 2025
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any example code in secret-op I can look at?
At a first glance it looks like stackabletech/secret-operator#634 is not using it yet

@Techassi Techassi requested a review from sbernauer October 1, 2025 12:16
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Techassi pushed his code, so we have a working example in secret-op.

What I don't like is that the webhook and the maintainer are not integrated into each other and we basically need to configure everything twice, which is annoying and error-prone:

pub async fn conversion_webhook(
    operator_environment: &OperatorEnvironmentOptions,
) -> anyhow::Result<(ConversionWebhookServer, mpsc::Receiver<Certificate>)> {
    let crds_and_handlers = [
        (
            SecretClass::merged_crd(SecretClassVersion::V1Alpha2)?,
            SecretClass::try_convert as fn(_) -> _,
        ),
        (
            TrustStore::merged_crd(TrustStoreVersion::V1Alpha1)?,
            TrustStore::try_convert as fn(_) -> _,
        ),
    ];

    let options = ConversionWebhookOptions {
        socket_addr: ConversionWebhookServer::DEFAULT_SOCKET_ADDRESS,
        namespace: operator_environment.operator_namespace.clone(),
        service_name: operator_environment.operator_service_name.clone(),
    };

    Ok(ConversionWebhookServer::new(crds_and_handlers, options).await?)
}

and further down

            let (maintainer, initial_reconcile_rx) = CustomResourceDefinitionMaintainer::new(
                client.as_kube_client(),
                certificate_rx,
                [
                    SecretClass::merged_crd(SecretClassVersion::V1Alpha2).unwrap(),
                    TrustStore::merged_crd(TrustStoreVersion::V1Alpha1).unwrap(),
                ],
                CustomResourceDefinitionMaintainerOptions {
                    operator_service_name: operator_environment.operator_service_name,
                    operator_namespace: operator_environment.operator_namespace,
                    field_manager: OPERATOR_NAME.to_owned(),
                    webhook_https_port: WebhookServer::DEFAULT_HTTPS_PORT,
                    disabled: maintenance.disable_crd_maintenance,
                },
            );

We duplicate:

  • The listen port
  • operator namespace
  • operator service name
  • The list of CRDS. What happens when I add a CRD to one but not hte other?
  • What stored version the CRDs should use. What happens if we use different onces?
  • We need to know that there is a channel of CA certs in between and need to wire it up

I'd say all of this can be gathered from the webhook.
I'm thinking of a API similar to this (one code place instead of two - keep main.rs clean!)

pub async fn conversion_webhook_and_crd_maintainer(
    client: Client,
    operator_environment: &OperatorEnvironmentOptions,
    maintenance: &MaintenanceOptions,
) -> anyhow::Result<(
    ConversionWebhookServer,
    CustomResourceDefinitionMaintainer,
    oneshot::Receiver<()>,
)> {
    let crds_and_handlers = [
        (
            SecretClass::merged_crd(SecretClassVersion::V1Alpha2)?,
            SecretClass::try_convert as fn(_) -> _,
        ),
        (
            TrustStore::merged_crd(TrustStoreVersion::V1Alpha1)?,
            TrustStore::try_convert as fn(_) -> _,
        ),
    ];

    let options = ConversionWebhookOptions {
        listen_addr: ConversionWebhookServer::DEFAULT_LISTEN_ADDRESS,
        listen_port: ConversionWebhookServer::DEFAULT_HTTPS_PORT,
        namespace: operator_environment.operator_namespace.clone(),
        service_name: operator_environment.operator_service_name.clone(),
    };

    let conversion_webhook = ConversionWebhookServer::new(crds_and_handlers, options).await?;
    let (crd_maintainer, initial_reconcile_rx) =
        CustomResourceDefinitionMaintainer::for_conversion_webhook(
            &conversion_webhook,
            client,
            OPERATOR_NAME.to_owned(),
            maintenance.disable_crd_maintenance,
        );

    Ok((conversion_webhook, crd_maintainer, initial_reconcile_rx))
}

We only specify everything once.

@Techassi Techassi force-pushed the feat/crd-maintenance branch from f5a1841 to 99eae00 Compare October 8, 2025 14:46
@Techassi
Copy link
Member Author

Techassi commented Oct 8, 2025

I had similar ideas to what is mentioned in #1099 (review). Commit 99eae00 implements a simplified API to construct a webhook server in combination with a CRD maintainer.

This follow-up commit uses the improved code: stackabletech/secret-operator@788dd08 (#634)

@Techassi Techassi requested a review from sbernauer October 8, 2025 14:49
Co-authored-by: Sebastian Bernauer <[email protected]>
@Techassi Techassi requested a review from sbernauer October 13, 2025 12:46
Copy link
Member

@sbernauer sbernauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that was it. Calling code in stackabletech/secret-operator#634 looks very nice and clean

// TODO (@Techassi): Use a trait type here which can be used to build all part of the
// conversion webhook server and a CRD maintainer.
crds_and_handlers: impl IntoIterator<Item = (CustomResourceDefinition, H)> + Clone,
operator_name: &'a str,
Copy link
Member

@sbernauer sbernauer Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some inter-mangling between the service name and the field manager.
Separated them here:

diff --git a/crates/stackable-webhook/src/maintainer.rs b/crates/stackable-webhook/src/maintainer.rs
index 649ac7d..7234898 100644
--- a/crates/stackable-webhook/src/maintainer.rs
+++ b/crates/stackable-webhook/src/maintainer.rs
@@ -94,10 +94,11 @@ impl<'a> CustomResourceDefinitionMaintainer<'a> {
     /// # async fn main() {
     /// # let (certificate_tx, certificate_rx) = channel(1);
     /// let options = CustomResourceDefinitionMaintainerOptions {
-    ///     operator_name: "my-service-name",
+    ///     operator_service_name: "my-service-name",
     ///     operator_namespace: "my-namespace",
     ///     webhook_https_port: 8443,
     ///     disabled: true,
+    ///     field_manager: "my-operator",
     /// };
     ///
     /// let client = Client::try_default().await.unwrap();
@@ -141,10 +142,11 @@ impl<'a> CustomResourceDefinitionMaintainer<'a> {
     /// [`std::task::Poll::Ready`] and thus doesn't consume any resources.
     pub async fn run(mut self) -> Result<(), Error> {
         let CustomResourceDefinitionMaintainerOptions {
+            operator_service_name,
             operator_namespace,
             webhook_https_port,
-            operator_name,
             disabled,
+            field_manager,
         } = self.options;
 
         // If the maintainer is disabled or there are no custom resource definitions, immediately
@@ -196,7 +198,7 @@ impl<'a> CustomResourceDefinitionMaintainer<'a> {
                         conversion_review_versions: vec!["v1".to_owned()],
                         client_config: Some(WebhookClientConfig {
                             service: Some(ServiceReference {
-                                name: operator_name.to_owned(),
+                                name: operator_service_name.to_owned(),
                                 namespace: operator_namespace.to_owned(),
                                 path: Some(format!("/convert/{crd_name}")),
                                 port: Some(webhook_https_port.into()),
@@ -211,7 +213,7 @@ impl<'a> CustomResourceDefinitionMaintainer<'a> {
 
                 // Deploy the updated CRDs using a server-side apply.
                 let patch = Patch::Apply(&crd);
-                let patch_params = PatchParams::apply(operator_name);
+                let patch_params = PatchParams::apply(field_manager);
                 crd_api
                     .patch(&crd_name, &patch_params, &patch)
                     .await
@@ -235,8 +237,8 @@ impl<'a> CustomResourceDefinitionMaintainer<'a> {
 // TODO (@Techassi): Make this a builder instead
 /// This contains required options to customize a [`CustomResourceDefinitionMaintainer`].
 pub struct CustomResourceDefinitionMaintainerOptions<'a> {
-    /// The service name used by the operator/conversion webhook and as a field manager.
-    pub operator_name: &'a str,
+    /// The name of the Kubernetes service which points to the operator/webhook.
+    pub operator_service_name: &'a str,
 
     /// The namespace the operator/conversion webhook runs in.
     pub operator_namespace: &'a str,
@@ -246,4 +248,7 @@ pub struct CustomResourceDefinitionMaintainerOptions<'a> {
 
     /// Indicates if the maintainer should be disabled.
     pub disabled: bool,
+
+    /// The fieldManager used when creating/patching CRDs.
+    pub field_manager: &'a str,
 }
diff --git a/crates/stackable-webhook/src/servers/conversion.rs b/crates/stackable-webhook/src/servers/conversion.rs
index 9b4beab..a71e2a7 100644
--- a/crates/stackable-webhook/src/servers/conversion.rs
+++ b/crates/stackable-webhook/src/servers/conversion.rs
@@ -194,9 +194,10 @@ impl ConversionWebhookServer {
         // TODO (@Techassi): Use a trait type here which can be used to build all part of the
         // conversion webhook server and a CRD maintainer.
         crds_and_handlers: impl IntoIterator<Item = (CustomResourceDefinition, H)> + Clone,
-        operator_name: &'a str,
+        operator_service_name: &'a str,
         operator_namespace: &'a str,
         disable_maintainer: bool,
+        field_manager: &'a str,
         client: Client,
     ) -> Result<
         (
@@ -213,9 +214,9 @@ impl ConversionWebhookServer {
 
         // TODO (@Techassi): These should be moved into a builder
         let webhook_options = ConversionWebhookOptions {
-            namespace: operator_namespace,
-            service_name: operator_name,
             socket_addr,
+            namespace: operator_namespace,
+            service_name: operator_service_name,
         };
 
         let (conversion_webhook_server, certificate_rx) =
@@ -225,10 +226,11 @@ impl ConversionWebhookServer {
 
         // TODO (@Techassi): These should be moved into a builder
         let maintainer_options = CustomResourceDefinitionMaintainerOptions {
+            operator_service_name,
+            operator_namespace,
             webhook_https_port: socket_addr.port(),
             disabled: disable_maintainer,
-            operator_namespace,
-            operator_name,
+            field_manager,
         };
 
         let (maintainer, initial_reconcile_rx) = CustomResourceDefinitionMaintainer::new(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need them to be separate? I purposely combined them to make the API easier to use. We currently use the same value for both these parameters anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's risky to enforce that the Kubernetes service name needs to be the same as the field manager.

Example: The k8s service name is passed via CLI arg, the field manager is hardcoded in Rust code.
If a user sets --k8s-service-name=foo, shit hits the fan as the field manager changes and CRD patches will fail because of conflicts between field managers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@Techassi Techassi Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the field manager is hardcoded

I think that's something we should not do. Instead, we want to use the same name across all code (which comes from the CLI). But that's something for the future.

For now, I made the field manager a separate field again (see 16ed410).

Copy link
Member

@sbernauer sbernauer Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the field manager!
I think hard-coding the manager is fine for now, I don't know if other operators support configuring this.
But we should also change operator_name to operator_service_name as my patch did. Reason being is that the thing we add to the CRD is the name of the service pointing to the webhook (which is in our case the operator, but does not always needs to be).

/// to a handler function. In most cases, the generated `CustomResource::try_merge` function
/// should be used. It provides the expected `fn(ConversionReview) -> ConversionReview`
/// signature.
/// - `operator_name`: The name of the operator. This is used to construct the webhook service
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - `operator_name`: The name of the operator. This is used to construct the webhook service
/// - `operator_service_name`: The name of the Kubernetes service which points to the
/// operator/webhook. This is put into the maintained CRD.

Comment on lines +200 to +201
/// - `operator_namespace`: The namespace the operator runs in. This is used to construct the
/// webhook service name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// - `operator_namespace`: The namespace the operator runs in. This is used to construct the
/// webhook service name.
/// - `operator_namespace`: The namespace the operator runs in. This is put into the
/// maintained CRD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Development: In Review

Development

Successfully merging this pull request may close these issues.

2 participants