-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change metric naming, unit, and http fields #213
base: main
Are you sure you want to change the base?
Conversation
internal/scti/handlers.go
Outdated
@@ -68,48 +68,48 @@ var ( | |||
lastSCTTimestamp *prometheus.GaugeVec // origin => value | |||
reqsCounter *prometheus.CounterVec // origin, op => value | |||
rspsCounter *prometheus.CounterVec // origin, op, code => value | |||
rspLatency *prometheus.HistogramVec // origin, op, code => value | |||
rspDuration *prometheus.HistogramVec // origin, op, code => value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, what's the thinking behind the name change?
What I think this metric represents is called latency distribution
in, er, other automatic monitoring systems we're intimately familiar with :)
If you're intent on changing it, "response duration" sounds a bit odd to my ears, but how about rspTime
? Also, the description of the metric still explains it in terms of Latency of responses
, so I guess that needs updating to match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, this PR lacked something very important: a link to https://prometheus.io/docs/practices/naming/, I've now added it. I'm 100% onboard with your comments and this looked a bit odd to me at first sight, being used to other monitoring systems... but it seems to be the prometheus way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the intent of the naming doc you linked was the http_
prefix there, and the ..._request_duration_seconds
was just an example.
I found this "counter" (hah!) example in the histogram section of the prom docs:
A common example would be the time it takes to reply to a request, called latency.
So I think there's latitude to fit in with styles we're familiar with.
internal/scti/handlers.go
Outdated
prometheus.HistogramOpts{ | ||
Name: "http_latency", | ||
Name: "ct_http_request_duration_seconds", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: variable naming is in terms of "response", but metric name is now in terms of "request" - would be good to keep them consistent if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, good point!
Towards #105.
Following recommendations from: https://prometheus.io/docs/practices/naming/
Prefix all the metrics with
ct_
, and suffix with units. Also, use seconds for the SCT timestamp unit since this is what guidelines recommend.... even if an SCT timestamp is in milliseconds... I'm a bit on the fence for this one, but these metrics are meant for graphs and human consumption, so seconds seem to more adapted indeed (and for anything more fancy, milliseconds are still available since the number exported is a float).Note that I'm explicitly specifying the
ct_
prefix manually and not using a wrapper, not to break code searching: I find that searching for a metric name is very useful when investigating an outage.