Skip to content

Conversation

@rexagod
Copy link
Member

@rexagod rexagod commented Oct 5, 2025

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Both of these changes need to be merged together. Additionally, MON-4788 also needs to be merged with these (once it's pushed).


Marked as draft for now as this needs all non-in-cluster-monitoring-stack recording rules ported to monitors as well.

Not 100% sure right now without investigating more on this, but I believe we can keep the existing machinery, i.e., the whitelisted machinery in CMO and everything from that point on as is, as that would serve as (a) a safeguard to make sure we pipe out the same rules and (b) a check to make sure the metrics (rules) being collected match the rules being sent out. But this will be worked on later once we start adding tests for this in origin (after the aforementioned porting effort is done).

This profile only caters to metrics that telemetry rules rely upon.

Signed-off-by: Pranshu Srivastava <[email protected]>
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 5, 2025
@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 5, 2025

@rexagod: This pull request references MON-4386 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

This pull request references MON-4387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Both of these changes need to be merged together. Additionally, MON-4788 also needs to be merged with these (once it's pushed).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 5, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 5, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rexagod

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 5, 2025
@rexagod rexagod force-pushed the MON-4386,MON-4387 branch from 9476550 to 9c9b657 Compare October 5, 2025 15:06
Adds in-cluster monitoring stack's telemetry metrics under the
`telemetry` profile.

Signed-off-by: Pranshu Srivastava <[email protected]>
@rexagod rexagod force-pushed the MON-4386,MON-4387 branch from 9c9b657 to d5f9e7e Compare October 5, 2025 15:07
@@ -1,5 +1,9 @@
# Note: This CHANGELOG is only for the monitoring team to track all monitoring related changes. Please see OpenShift release notes for official changes.

## 4.XX
Copy link
Member Author

@rexagod rexagod Oct 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since adding external support may take a while.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 12, 2025

@rexagod: This pull request references MON-4386 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

This pull request references MON-4387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Both of these changes need to be merged together. Additionally, MON-4788 also needs to be merged with these (once it's pushed).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

openshift-ci-robot commented Oct 12, 2025

@rexagod: This pull request references MON-4386 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

This pull request references MON-4387 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.21.0" version, but no target version was set.

Details

In response to this:

  • I added CHANGELOG entry for this change.
  • No user facing changes, so no entry in CHANGELOG was needed.

Both of these changes need to be merged together. Additionally, MON-4788 also needs to be merged with these (once it's pushed).


Marked as draft for now as this needs all non-in-cluster-monitoring-stack recording rules ported to monitors as well.

Not 100% sure right now without investigating more on this, but I believe we can keep the existing machinery, i.e., the whitelisted machinery in CMO and everything from that point on as is, as that would serve as (a) a safeguard to make sure we pipe out the same rules and (b) a check to make sure the metrics (rules) being collected match the rules being sent out. But this will be worked on later once we start adding tests for this in origin (after the aforementioned porting effort is done).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@rexagod rexagod changed the title MON-4386,MON-4387: Introduce telemetry profile and add support for in-house components MON-4386,MON-4387: telemetry Collection Profile Oct 14, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Dec 3, 2025

@rexagod: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/misspell d5f9e7e link true /test misspell

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 3, 2025
@openshift-merge-robot
Copy link
Contributor

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@rexagod
Copy link
Member Author

rexagod commented Dec 11, 2025

I wanted to talk a bit regarding the telemetry collection profile's expected nature.

The minimal collection profile started off with introducing corresponding monitors for our components, which can be extended by external component owners to cast a wider net, and cut down on metric ingestion.

The telemetry collection profile veers off from this convention as it demands a complete set of telemetry monitors (to fulfill the exact set of metrics required by the telemetry rules) from the get-go. This has already been done for internal components in this draft, but doing so for external components is complicated, even when we have the complete list of metrics now.

If CMO runs with remote-write capability (no telemeter-client), targets may be injected with the same relabellings that are done for the telemetry one. This may have performance implications since every target would have it's metrics filtered through the same relabelling rules' regexes, but it'd be more dynamic than keeping a record of monitor keys for the ones that are needed for telemetry, since that wouldn't be immune to breaking changes (renaming or migration). However, as it is right now we'd need an approach that not only covers this but also the legacy (federated) telemeter-client-based approach, since users should also see the cut-down in metric volumes in not just their LTS storage systems, but also the PVs hooked up for data retention, which brings us back to constraining at the monitor level.

A more manageable and fitting way would be to adhere to the same pattern as the minimal collection profile, and start constraining metrics within the monitoring stack, i.e., with the monitors we control. Teams would be encouraged (in the handbook documentation, and maybe an AI-based check on the PR) to curate monitors on their end with a specific annotation so eventually the telemetry metric volume comes closer to what the telemetry rules really need. This wouldn't be subject to breaking changes, and happen at its own pace in the wider OpenShift ecosystem.

@simonpasquier Curious to hear your thoughts on this. I believe the last passage makes the most sense going forward for this collection profile, but PLMK if I missed something.

@rexagod
Copy link
Member Author

rexagod commented Dec 16, 2025

⬆️ This was resolved over Slack. We'll move forward with the same approach as we did for the minimal profile.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants