-
Notifications
You must be signed in to change notification settings - Fork 384
MON-4406: ClusterMonitoring CRD Controller #2707
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MON-4406: ClusterMonitoring CRD Controller #2707
Conversation
danielmellado
commented
Oct 10, 2025
- I added CHANGELOG entry for this change.
- No user facing changes, so no entry in CHANGELOG was needed.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: danielmellado The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2a1c222 to
597e76f
Compare
|
/jira backport release-4.20 |
|
@danielmellado: The following backport issues have been created: Queuing cherrypicks to the requested branches to be created after this PR merges: DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@openshift-ci-robot: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
597e76f to
e295a39
Compare
e295a39 to
a4de54d
Compare
|
/retitle MON-4406: ClusterMonitoring CRD Controller |
|
@danielmellado: This pull request references MON-4406 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest-required |
|
/cc @simonpasquier |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move it to its own package? pkg/alert is for AlertingRule resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll move it to pkg/clustermonitoring/ to keep things clear, thanks!
| } | ||
|
|
||
| for i := 0; i < workers; i++ { | ||
| go c.worker(ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need multiple workers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the other controllers in CMO (RuleController, RelabelConfigController), they accept the workers parameter but always run a single worker. I'll update to match that pattern.
| ) | ||
|
|
||
| // ClusterMonitoringController is a controller for ClusterMonitoring resources. | ||
| type ClusterMonitoringController struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that we need a full-blown controller? What we want is something that triggers a CMO reconciliation anytime the config custom resource gets updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller is already fairly minimal - the sync() function just calls triggerReconcile(). I can simplify it a bit further but I think keeping the basic informer + workqueue structure works just fine. What would you suggest? Let me know if you'd prefer a different approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd follow what we do for secrets & configmaps: create a new shared informer watching the CMO config custom resource in pkg/operator and call *Operator.handleEvent() whenever we see an update.
pkg/operator/operator.go
Outdated
| return fmt.Errorf("failed to get ClusterMonitoring CRD: %w", err) | ||
| } | ||
|
|
||
| if crd.Spec.AlertmanagerConfig.DeploymentMode != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DeploymentMode can't be empty given the API validatoin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, the API validation ensures a default value. I'll remove this check.
pkg/operator/operator.go
Outdated
| func (o *Operator) mergeAlertmanagerConfig(c *manifests.Config, amConfig *configv1alpha1.AlertmanagerConfig) error { | ||
| if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDisabled { | ||
| enabled := false | ||
| c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit)
| c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled | |
| c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = ptr.To(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for the suggestion ;)
pkg/operator/operator.go
Outdated
| } | ||
|
|
||
| func (o *Operator) mergeAlertmanagerConfig(c *manifests.Config, amConfig *configv1alpha1.AlertmanagerConfig) error { | ||
| if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDisabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) I'd prefer a switch/case to ensure that we don't miss any value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I'll change this to a to switch with a default case that returns an error for unknown values.
pkg/operator/operator.go
Outdated
|
|
||
| if amConfig.DeploymentMode == configv1alpha1.AlertManagerDeployModeDefaultConfig { | ||
| enabled := true | ||
| c.ClusterMonitoringConfiguration.AlertmanagerMainConfig.Enabled = &enabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does it mean if other Alertmanager options are set in the CMO configmap? Are they taken into account or should we void them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, the CRD values are merged on top of configmap values - so configmap settings remain unless explicitly overridden by the CRD. For Disabled and DefaultConfig modes, only the Enabled flag is set. For CustomConfig, specific fields from the CRD override the corresponding configmap fields.
pkg/operator/operator.go
Outdated
| relabelController: relabelController, | ||
| } | ||
|
|
||
| clusterMonitoringController, err = alert.NewClusterMonitoringController(ctx, c, version, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we read the feature gates from the API and run the controller (+merge the config) only if the ClusterMonitoringConfig gate is enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. The merge itself is safe (returns early if CR not found), but you're right that we shouldn't start the controller if the feature gate is disabled - it would create an informer watching a non-existent API. Actually, the implementation within the CMO itself it's a bit mixed but I can add that check!
a4de54d to
a9c6a82
Compare
adds ClusterMonitoring controller that watches the CRD and triggers reconciliation. implements merge logic to apply AlertmanagerConfig settings from the CRD over the existing ConfigMap configuration. supports three deployment modes (Disabled, DefaultConfig, CustomConfig) with fields for pod scheduling, resources, secrets, volumeClaimTemplate and logLevel.
a9c6a82 to
6323f53
Compare
|
Thinking more about it, I'd go with a first PR which only watches the CMO config resource and calls the reconciliation loop on changes without merging the 2 configs. |
I understand that you want to simpifly the architecture, the only difference would be that this resource is cluster-wide. In order to allow to move things I've created #2770 for this first PR. Thanks! |
|
@danielmellado: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |