-
-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the metrics cardinality #160
Comments
This is a major issue when you identify your @umputun This makes the metrics endpoint unusable for large production systems. |
Can you provide a PR for this change? |
I have been doing this type of work already for an issue with fastly/fastly-exporter#152 which has resulted in the solution fastly/fastly-exporter#153. @umputun Let me know which direction you'd like to go in based on the feedback provided by @ShoshinNikita. My question is mostly around do we drop the field or keep the field with the value of Can you also provide your preferred name for the CLI argument to toggle this feature? I would only plan to add this specific feature and not the other recommendations from @ShoshinNikita. |
I suggest not to use Also I suggest to add label "server" to all metrics to follow a unified approach and to make buckets for I can make a PR with this improvements |
sounds like a good idea to me. Probably should be some new cil/env option(s) to turn this "low cardinality" metrics on |
Right now the metric
http_response_time_seconds
has one label -path
, andr.URL.Path
is used as its value. This can be an issue when a service behind Reproxy uses dynamic urls (for example,/api/v1/users/{user_id}
). In this case, the response size of/metrics
may become too large to process by Prometheus (especially when resources are limited).We can't just drop this label because of the backward compatibility promise. However, we can make it optional. It would be enabled by default - again, to comply with the backward compatibility promise.
I also have a few more suggestions (more → less important):
server
to all metrics. It feels wrong that onlyhttp_requests_total
has this label. For example, it prevents me from creating an alert for 500+ statuses for specific upstreams.http_request_size_bytes{server}
andhttp_response_size_bytes{server}
that would allow users to monitor the size of requests and responses.http_response_time_seconds
- for example,10
and30
. The current "highest" bucket is5s
, but it can take much longer to process some types of requests. At the same time, I understand that it's hard to find a good and fitting maximum value. Another option it to make the buckets configurable - but I am not sure it would be practical.The text was updated successfully, but these errors were encountered: