Skip to content

Conversation

@danielmellado
Copy link
Contributor

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 10, 2025
@openshift-ci openshift-ci bot requested review from jan--f and marioferh October 10, 2025 12:43
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 10, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danielmellado

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2025
@danielmellado danielmellado force-pushed the add_alertmanager_crd_controller branch 2 times, most recently from 2a1c222 to 597e76f Compare October 13, 2025 09:04
@danielmellado
Copy link
Contributor Author

/jira backport release-4.20

@openshift-ci-robot
Copy link
Contributor

@danielmellado: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.20

Details

In response to this:

/jira backport release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-cherrypick-robot

@openshift-ci-robot: once the present PR merges, I will cherry-pick it on top of release-4.20 in a new PR and assign it to you.

Details

In response to this:

@danielmellado: The following backport issues have been created:

Queuing cherrypicks to the requested branches to be created after this PR merges:
/cherrypick release-4.20

In response to this:

/jira backport release-4.20

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@danielmellado danielmellado force-pushed the add_alertmanager_crd_controller branch from 597e76f to e295a39 Compare October 13, 2025 13:12
@danielmellado danielmellado force-pushed the add_alertmanager_crd_controller branch from e295a39 to a4de54d Compare November 20, 2025 08:27
@danielmellado
Copy link
Contributor Author

danielmellado commented Nov 20, 2025

/retitle MON-4406: ClusterMonitoring CRD Controller

@openshift-ci openshift-ci bot changed the title WIP: Initial minimal ClusterMonitoring CRD Controller MON-4406: ClusterMonitoring CRD Controller Nov 20, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 20, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 20, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Nov 20, 2025

@danielmellado: This pull request references MON-4406 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@danielmellado
Copy link
Contributor Author

/retest-required

@simonpasquier
Copy link
Contributor

/cc @simonpasquier

@openshift-ci openshift-ci bot requested a review from simonpasquier December 1, 2025 08:36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move it to its own package? pkg/alert is for AlertingRule resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll move it to pkg/clustermonitoring/ to keep things clear, thanks!

}

for i := 0; i < workers; i++ {
go c.worker(ctx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need multiple workers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the other controllers in CMO (RuleController, RelabelConfigController), they accept the workers parameter but always run a single worker. I'll update to match that pattern.

)

// ClusterMonitoringController is a controller for ClusterMonitoring resources.
type ClusterMonitoringController struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that we need a full-blown controller? What we want is something that triggers a CMO reconciliation anytime the config custom resource gets updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The controller is already fairly minimal - the sync() function just calls triggerReconcile(). I can simplify it a bit further but I think keeping the basic informer + workqueue structure works just fine. What would you suggest? Let me know if you'd prefer a different approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd follow what we do for secrets & configmaps: create a new shared informer watching the CMO config custom resource in pkg/operator and call *Operator.handleEvent() whenever we see an update.

return fmt.Errorf("failed to get ClusterMonitoring CRD: %w", err)
}

if crd.Spec.AlertmanagerConfig.DeploymentMode != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeploymentMode can't be empty given the API validatoin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, the API validation ensures a default value. I'll remove this check.

func (o *Operator) mergeAlertmanagerConfig(c *manifests.Config, amConfig *configv1alpha1.AlertmanagerConfig) error {
if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDisabled {
enabled := false
c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled
c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = ptr.To(false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for the suggestion ;)

}

func (o *Operator) mergeAlertmanagerConfig(c *manifests.Config, amConfig *configv1alpha1.AlertmanagerConfig) error {
if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDisabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I'd prefer a switch/case to ensure that we don't miss any value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, I'll change this to a to switch with a default case that returns an error for unknown values.


if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDefaultConfig {
enabled := true
c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does it mean if other Alertmanager options are set in the CMO configmap? Are they taken into account or should we void them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, the CRD values are merged on top of configmap values - so configmap settings remain unless explicitly overridden by the CRD. For Disabled and DefaultConfig modes, only the Enabled flag is set. For CustomConfig, specific fields from the CRD override the corresponding configmap fields.

relabelController: relabelController,
}

clusterMonitoringController, err = alert.NewClusterMonitoringController(ctx, c, version, func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we read the feature gates from the API and run the controller (+merge the config) only if the ClusterMonitoringConfig gate is enabled?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The merge itself is safe (returns early if CR not found), but you're right that we shouldn't start the controller if the feature gate is disabled - it would create an informer watching a non-existent API. Actually, the implementation within the CMO itself it's a bit mixed but I can add that check!

@danielmellado danielmellado force-pushed the add_alertmanager_crd_controller branch from a4de54d to a9c6a82 Compare December 2, 2025 18:25
adds ClusterMonitoring controller that watches the CRD and triggers
reconciliation. implements merge logic to apply AlertmanagerConfig
settings from the CRD over the existing ConfigMap configuration.

supports three deployment modes (Disabled, DefaultConfig, CustomConfig)
with fields for pod scheduling, resources, secrets, volumeClaimTemplate
and logLevel.
@danielmellado danielmellado force-pushed the add_alertmanager_crd_controller branch from a9c6a82 to 6323f53 Compare December 3, 2025 08:10
@simonpasquier
Copy link
Contributor

Thinking more about it, I'd go with a first PR which only watches the CMO config resource and calls the reconciliation loop on changes without merging the 2 configs.

@danielmellado
Copy link
Contributor Author

Thinking more about it, I'd go with a first PR which only watches the CMO config resource and calls the reconciliation loop on changes without merging the 2 configs.

I understand that you want to simpifly the architecture, the only difference would be that this resource is cluster-wide. In order to allow to move things I've created #2770 for this first PR. Thanks!

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 3, 2025

@danielmellado: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ginkgo-tests 6323f53 link false /test ginkgo-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants