Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add new gauge metric for alertmanager config size #9267

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

titolins
Copy link
Contributor

@titolins titolins marked this pull request as ready for review September 11, 2024 12:06
@titolins titolins requested review from a team as code owners September 11, 2024 12:06
@titolins titolins force-pushed the titolins/add_remote_config_size_metric branch from 550ff6d to ed8da98 Compare September 11, 2024 12:07
Copy link
Contributor

@santihernandezc santihernandezc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few comments:

Comment on lines 405 to 408
configSize: promauto.With(registerer).NewGaugeVec(prometheus.GaugeOpts{
Name: "cortex_alertmanager_config_size_bytes",
Help: "Size of the last received alertmanager configuration.",
}, []string{"user"}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a bit ambiguous, it's not clear that we're only using this for Grafana configurations. Maybe something like cortex_alertmanager_grafana_config_size_bytes is enough to avoid confusions.

The multitenantAlertmanagerMetrics struct might be a better place for defining this metric. We could add a new field (e.g. grafanaConfigSize) and initialize the gauge in newMultitenantAlertmanagerMetrics()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name is a bit ambiguous, it's not clear that we're only using this for Grafana configurations. Maybe something like cortex_alertmanager_grafana_config_size_bytes is enough to avoid confusions.

Yep, good point - changed now 👍

The multitenantAlertmanagerMetrics struct might be a better place for defining this metric. We could add a new field (e.g. grafanaConfigSize) and initialize the gauge in newMultitenantAlertmanagerMetrics()

Agreed. Also fixed now 👍

@@ -316,6 +316,7 @@ func (am *MultitenantAlertmanager) SetUserGrafanaConfig(w http.ResponseWriter, r
util.WriteJSONResponse(w, errorResult{Status: statusError, Error: fmt.Sprintf("%s: %s", errReadingGrafanaConfig, err.Error())})
return
}
am.multitenantMetrics.grafanaConfigSize.WithLabelValues(userID).Set(float64(len(payload)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few problems with this approach:

  • If the AM restarts, the Grafana configuration might be there, but we won't get metrics for it until the next POST request from Grafana to update the config
  • The metric doesn't change when the configuration is deleted

What if we updated this metric during the config sync for each tenant? E.g. in computeConfig(), by setting it to the len(cfgs.Grafana.RawConfig).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, moved that to computeConfig now 👍

pkg/alertmanager/multitenant.go Outdated Show resolved Hide resolved
@@ -721,10 +728,12 @@ func (am *MultitenantAlertmanager) computeConfig(cfgs alertspb.AlertConfigDescs)
// Grafana configuration.
case cfgs.Mimir.RawConfig == am.fallbackConfig:
level.Debug(am.logger).Log("msg", "mimir configuration is default, using grafana config with the default globals", "user", cfgs.Mimir.User)
am.multitenantMetrics.grafanaConfigSize.WithLabelValues(cfgs.Grafana.User).Set(float64(len(cfgs.Grafana.RawConfig)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do this once at the top of the method, before entering the switch. Even if the Grafana config is ignored, we'll get the size in bytes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I wasn't sure if we actually wanted to record it in those cases.
I've moved it out of computeConfigs and into syncConfigs instead, mainly because of the user reference from the cfgMap but also considering there are other metrics being recorded there. Please let me know if you agree 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants