-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Guidance wrt Labelling to Naming and Rules Best Practices #2691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Conall O'Brien <[email protected]>
Signed-off-by: Conall O'Brien <[email protected]>
Obligatory post-it note reminder: https://photos.app.goo.gl/Bkfir4wRiLtNVG4W8 |
With my current patchy availability, there is little chance I get to this anytime soon. Maybe @juliusv has a qualified opinion here? |
Hey @conallob, congrats for the nice PR! Just one thing: maybe you could fix the
There is a small confusion I would love to see fixed, in a paragraph just below one of your edits. It is:
I don't understand the |
Signed-off-by: Conall O'Brien <[email protected]>
Fixed the typo.
I'm afraid that best practice is unrelated. It also makes sense as written, once you've written enough rules. It's weighing up the trade-off between tracking the chain of operations across a pipeline of rules vs the rule name growing unwieldy. Many of these best practices trace back to specific philosophies from Prometheus' predecessor. If you still think it needs a polish, please a separate doc bug. |
Co-authored-by: Ben Kochie <[email protected]> Signed-off-by: Conall O'Brien <[email protected]>
Co-authored-by: Ben Kochie <[email protected]> Signed-off-by: Conall O'Brien <[email protected]>
Keeping the metric name unchanged makes it easy to know what a metric is and | ||
easy to find in the codebase. | ||
|
||
IMPORTANT: `job` label acts as a primary key. It is **strongly** recommended that you use it to scope your PromQL expressions to the system you are monitoring. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is misleading. Prometheus doesn't have the concept of "primary key". Not even metric names are a "primary key".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, especially since folks used to SQL DBs will jump to the conclusion that it's a SQL DB, which it isn't.
Iterated on the language to avoid creating ambiguity
Iterate on the description of the job label, removing "primary key", given it's association with SQL Signed-off-by: Conall O'Brien <[email protected]>
PTAL |
Friendly ping? |
Friendly, you're not at SRECon EMEA this week, ping? |
WARNING: When using `without`, be careful not to strip out the `job` label accidentally. | ||
|
||
* `instance` | ||
* The `instance` label will include the `ip:port` what was scraped, providing a crucial breadcrumb for debugging scrape time issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The `instance` label will include the `ip:port` what was scraped, providing a crucial breadcrumb for debugging scrape time issues | |
* The `instance` label by default will include the `ip:port` what was scraped. |
## Labels | ||
|
||
* `job` | ||
* The `job` label is one of the few ubiquitious labels, set at scrape time, and is used to identify metrics scraped from the same target/exporter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* The `job` label is one of the few ubiquitious labels, set at scrape time, and is used to identify metrics scraped from the same target/exporter. | |
* The `job` is a default target label set by the scrape configs and is used to identify metrics scraped from the same target/exporter. |
|
||
* `job` | ||
* The `job` label is one of the few ubiquitious labels, set at scrape time, and is used to identify metrics scraped from the same target/exporter. | ||
* If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is really a useful note here, as this applies to all label matching.
* If not specified in PromQL expressions, they will match unrelated metrics with the same name. This is especially true in a multi system or multi tenant installation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It applies to all labels. But job
and instance
are two uniform labels found on every metric, including ubiquitous synthetic metrics such as up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it's not related to job, but related to "target labels" and discovery. That is a different thing and related to querying, not creating labels.
WARNING: When using `without`, be careful not to strip out the `job` label accidentally. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning doesn't make a lot of sense to me. It has a high probability of being quoted as copy-pasta without being understood. Let's just drop it.
WARNING: When using `without`, be careful not to strip out the `job` label accidentally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to encourage copy-pasta, but this is an important point.
If using alerting expressions like up{job=bla} > 0 for 3m
, you need to be careful not to accidentally strip the job
label. If you do, your alert no longer works as intended.
I'll polish the wording here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem related to naming practices, which is what this guide is about.
Co-authored-by: Ben Kochie <[email protected]> Signed-off-by: Conall O'Brien <[email protected]>
For perspective, one of the motivations behind this PR is the anti-patterm of writing alert expressions intended for a single tenant system, which has evolved into a multi-tenant system. e.g Once you start adding additional jobs that match on the same labels (e.g Daemonsets, fleet-wide node_exporter, etc), teams start getting paged for systems they don't own or care about |
Add Guidance wrt Labelling to Naming and Rules Best Practices to docs/practices/naming.md and docs/practices/rules.md, specifically:
job
andinstance
job
label, especially in multi-tenant systemsThis Fixes #2690