MON-4406: ClusterMonitoring CRD Controller #2707

danielmellado · 2025-10-10T12:42:54Z

I added CHANGELOG entry for this change.
No user facing changes, so no entry in CHANGELOG was needed.

openshift-ci · 2025-10-10T12:43:35Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielmellado

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [danielmellado]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danielmellado · 2025-10-13T09:51:57Z

/jira backport release-4.20

openshift-ci-robot · 2025-10-13T09:51:59Z

@danielmellado: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.20

Details

In response to this:

/jira backport release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-cherrypick-robot · 2025-10-13T09:52:02Z

@openshift-ci-robot: once the present PR merges, I will cherry-pick it on top of release-4.20 in a new PR and assign it to you.

Details

In response to this:

@danielmellado: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.20

In response to this:

/jira backport release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

danielmellado · 2025-11-20T11:55:53Z

/retitle MON-4406: ClusterMonitoring CRD Controller

openshift-ci-robot · 2025-11-20T11:56:00Z

@danielmellado: This pull request references MON-4406 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

I added CHANGELOG entry for this change.

No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

danielmellado · 2025-11-27T08:22:15Z

/retest-required

simonpasquier · 2025-12-01T08:36:23Z

/cc @simonpasquier

simonpasquier · 2025-12-02T14:49:46Z

pkg/clustermonitoring/controller.go

can we move it to its own package? pkg/alert is for AlertingRule resources.

Sure, I'll move it to pkg/clustermonitoring/ to keep things clear, thanks!

simonpasquier · 2025-12-02T14:52:42Z

pkg/alert/cluster_monitoring_controller.go

+	}
+
+	for i := 0; i < workers; i++ {
+		go c.worker(ctx)


we don't need multiple workers?

Looking at the other controllers in CMO (RuleController, RelabelConfigController), they accept the workers parameter but always run a single worker. I'll update to match that pattern.

simonpasquier · 2025-12-02T14:57:00Z

pkg/alert/cluster_monitoring_controller.go

+)
+
+// ClusterMonitoringController is a controller for ClusterMonitoring resources.
+type ClusterMonitoringController struct {


I'm not sure that we need a full-blown controller? What we want is something that triggers a CMO reconciliation anytime the config custom resource gets updated.

The controller is already fairly minimal - the sync() function just calls triggerReconcile(). I can simplify it a bit further but I think keeping the basic informer + workqueue structure works just fine. What would you suggest? Let me know if you'd prefer a different approach.

I'd follow what we do for secrets & configmaps: create a new shared informer watching the CMO config custom resource in pkg/operator and call *Operator.handleEvent() whenever we see an update.

simonpasquier · 2025-12-02T15:01:35Z

pkg/operator/operator.go

+		return fmt.Errorf("failed to get ClusterMonitoring CRD: %w", err)
+	}
+
+	if crd.Spec.AlertmanagerConfig.DeploymentMode != "" {


DeploymentMode can't be empty given the API validatoin?

You're right, the API validation ensures a default value. I'll remove this check.

simonpasquier · 2025-12-02T15:02:45Z

pkg/operator/operator.go

+func (o *Operator) mergeAlertmanagerConfig(c *manifests.Config, amConfig *configv1alpha1.AlertmanagerConfig) error {
+	if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDisabled {
+		enabled := false
+		c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled


(nit)

Suggested change

c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled

c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = ptr.To(false)

Done, thanks for the suggestion ;)

simonpasquier · 2025-12-02T15:03:31Z

pkg/operator/operator.go

+}
+
+func (o *Operator) mergeAlertmanagerConfig(c *manifests.Config, amConfig *configv1alpha1.AlertmanagerConfig) error {
+	if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDisabled {


(nit) I'd prefer a switch/case to ensure that we don't miss any value.

Agree, I'll change this to a to switch with a default case that returns an error for unknown values.

simonpasquier · 2025-12-02T15:04:59Z

pkg/operator/operator.go

+
+	if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDefaultConfig {
+		enabled := true
+		c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled


what does it mean if other Alertmanager options are set in the CMO configmap? Are they taken into account or should we void them?

Currently, the CRD values are merged on top of configmap values - so configmap settings remain unless explicitly overridden by the CRD. For Disabled and DefaultConfig modes, only the Enabled flag is set. For CustomConfig, specific fields from the CRD override the corresponding configmap fields.

simonpasquier · 2025-12-02T15:06:27Z

pkg/operator/operator.go

 		relabelController:    relabelController,
 	}

+	clusterMonitoringController, err = alert.NewClusterMonitoringController(ctx, c, version, func() {


should we read the feature gates from the API and run the controller (+merge the config) only if the ClusterMonitoringConfig gate is enabled?

Good point. The merge itself is safe (returns early if CR not found), but you're right that we shouldn't start the controller if the feature gate is disabled - it would create an informer watching a non-existent API. Actually, the implementation within the CMO itself it's a bit mixed but I can add that check!

adds ClusterMonitoring controller that watches the CRD and triggers reconciliation. implements merge logic to apply AlertmanagerConfig settings from the CRD over the existing ConfigMap configuration. supports three deployment modes (Disabled, DefaultConfig, CustomConfig) with fields for pod scheduling, resources, secrets, volumeClaimTemplate and logLevel.

simonpasquier · 2025-12-03T09:21:05Z

Thinking more about it, I'd go with a first PR which only watches the CMO config resource and calls the reconciliation loop on changes without merging the 2 configs.

danielmellado · 2025-12-03T10:04:22Z

Thinking more about it, I'd go with a first PR which only watches the CMO config resource and calls the reconciliation loop on changes without merging the 2 configs.

I understand that you want to simpifly the architecture, the only difference would be that this resource is cluster-wide. In order to allow to move things I've created #2770 for this first PR. Thanks!

openshift-ci · 2025-12-03T11:02:39Z

@danielmellado: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ginkgo-tests	`6323f53`	link	false	`/test ginkgo-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2025

openshift-ci bot requested review from jan--f and marioferh October 10, 2025 12:43

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2025

danielmellado force-pushed the add_alertmanager_crd_controller branch 2 times, most recently from 2a1c222 to 597e76f Compare October 13, 2025 09:04

danielmellado force-pushed the add_alertmanager_crd_controller branch from 597e76f to e295a39 Compare October 13, 2025 13:12

danielmellado force-pushed the add_alertmanager_crd_controller branch from e295a39 to a4de54d Compare November 20, 2025 08:27

openshift-ci bot changed the title ~~WIP: Initial minimal ClusterMonitoring CRD Controller~~ MON-4406: ClusterMonitoring CRD Controller Nov 20, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 20, 2025

openshift-ci bot requested a review from simonpasquier December 1, 2025 08:36

simonpasquier reviewed Dec 2, 2025

View reviewed changes

danielmellado force-pushed the add_alertmanager_crd_controller branch from a4de54d to a9c6a82 Compare December 2, 2025 18:25

danielmellado force-pushed the add_alertmanager_crd_controller branch from a9c6a82 to 6323f53 Compare December 3, 2025 08:10

	c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled
	c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = ptr.To(false)

MON-4406: ClusterMonitoring CRD Controller #2707

Are you sure you want to change the base?

MON-4406: ClusterMonitoring CRD Controller #2707

Uh oh!

Conversation

danielmellado commented Oct 10, 2025

Uh oh!

openshift-ci bot commented Oct 10, 2025

Uh oh!

danielmellado commented Oct 13, 2025

Uh oh!

openshift-ci-robot commented Oct 13, 2025

Uh oh!

openshift-cherrypick-robot commented Oct 13, 2025

Uh oh!

danielmellado commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci-robot commented Nov 20, 2025 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danielmellado commented Nov 27, 2025

Uh oh!

simonpasquier commented Dec 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonpasquier commented Dec 3, 2025

Uh oh!

danielmellado commented Dec 3, 2025

Uh oh!

openshift-ci bot commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

danielmellado commented Nov 20, 2025 •

edited by openshift-ci bot

Loading

openshift-ci-robot commented Nov 20, 2025 •

edited by openshift-ci bot

Loading