You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When debugging whether a controller is congested or not, we can measure how long our own reconcilers take to proccess an event using something like #[instrument] on the reconciler fn, but we have do not have access to how long an event spends in scheduling before it hits our reconciler, nor do we have information about how deep the queue is.
Having these numbers would be useful because it lets us more accurately tune controller parallelism and, as a result, vertically scale appropriately.
If it's something that can be exposed, then maybe two synchronised numbers is sufficient, but that might be quite difficult to actually thread through from Controller -> scheduler...
Describe alternatives you've considered
A feature-flagged metrics module using prometheus_client. This is the official rust prometheus client and it seems relatively light, but it should be fine behind an optional feature flag if we want to go down this route. Should try to do kube-rs/controller-rs#55 first if this is desirable.
..there is another that is well used and very optimized (prometheus) - but that also is lacking too many features and hasn't been released in a year so hard to recommend that these days
Documentation, Adoption, Migration Strategy
docs on scaling + metrics on kube.rs
release notes
imagine this will be purely additive
Target crate for feature
kube-runtime
The text was updated successfully, but these errors were encountered:
Ok, so in addition to the currently used prometheus crate, I have now tested out two more prometheus libraries: the official prometheus_client crate, and a more unhinged optimization focused metric library; measured.
The TL;DR for me is that there are currently complex tradeoffs with choosing a metric library; "do you want exemplars, do you care about application memory use/perf, the cardinality cost on your clusters prometheus, do you care about maintenance of the libraries, do you care about correctness of rate, do you just want it to be easy?". Ideally you should be able to answer all these questions with a YES, but you just cannot, yet.
I wrote up some pros and cons of the two libraries (in particular) in kube-rs/controller-rs#55
and its implementation PRs in kube-rs/controller-rs#71 + kube-rs/controller-rs#72. At the moment I might lean towards using prometheus_client in controller-rs as it's the easiest for users at the moment, but ideally want to use measured, especially for my own more optimised stuff.
Most importantly though, I don't think we should pick one metric library in kube as "the one we support (take it or leave it)" but rather expose signals that can maybe be instrumented into a metrics struct for one or more of those libraries with feature flags.
In short I think we should explore one or more of these (with minimally 1 done):
add kube/metrics feature exposing the core signals as inspectable numbers (with atomics / synchronised timestamps / whatever we need).
add kube/metrics-measured using kube/metrics to expose an interfacing measured metrics struct
2 and 3 isn't that hard to do, so it could be kept in examples, but I am not opposed to having ease-of-use stuff available also (if we are anyway going to test it on CI), as I need it all the time - and doesn't really add any heavy deps.
clux
linked a pull request
Jul 4, 2024
that will
close
this issue
Would you like to work on this feature?
Maybe
What problem are you trying to solve?
When debugging whether a controller is congested or not, we can measure how long our own reconcilers take to proccess an event using something like
#[instrument]
on the reconciler fn, but we have do not have access to how long an event spends in scheduling before it hits our reconciler, nor do we have information about how deep the queue is.Having these numbers would be useful because it lets us more accurately tune controller parallelism and, as a result, vertically scale appropriately.
Ref: good article on common metrics for queues
Describe the solution you'd like
If it's something that can be exposed, then maybe two synchronised numbers is sufficient, but that might be quite difficult to actually thread through from
Controller
->scheduler
...Describe alternatives you've considered
A feature-flagged metrics module using prometheus_client. This is the official rust prometheus client and it seems relatively light, but it should be fine behind an optional feature flag if we want to go down this route. Should try to do kube-rs/controller-rs#55 first if this is desirable.
..there is another that is well used and very optimized (prometheus) - but that also is lacking too many features and hasn't been released in a year so hard to recommend that these days
Documentation, Adoption, Migration Strategy
imagine this will be purely additive
Target crate for feature
kube-runtime
The text was updated successfully, but these errors were encountered: