fix: treat prometheus counters as rates in autoscaling signals#1042
fix: treat prometheus counters as rates in autoscaling signals#1042nXtCyberNet wants to merge 2 commits into
Conversation
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request updates the metric collector to calculate rates for Prometheus counters by comparing current values with previous snapshots, introducing a new GetLastUnfreshSnapshotWithTimestamp method in the sliding window structure to track scrape intervals. The reviewer suggested improving the precision of rate calculations by using a single consistent timestamp per pod scrape rather than calling the timestamp function multiple times, and recommended refactoring GetLastUnfreshSnapshot to utilize the new method to reduce code duplication.
| collector.processPrometheusString(result, pastHistogramMap, pastCounterMap, currentHistogramMap, currentCounterMap, pastScrapeTimestamp, instanceInfo.MetricsMap) | ||
| (*currentHistograms)[pod.Name] = HistogramInfo{ | ||
| PodStartTime: pod.Status.StartTime, | ||
| HistogramMap: currentHistogramMap, | ||
| PodStartTime: pod.Status.StartTime, | ||
| HistogramMap: currentHistogramMap, | ||
| CounterMap: currentCounterMap, | ||
| ScrapeTimestamp: util.GetCurrentTimestamp(), | ||
| } |
There was a problem hiding this comment.
For improved precision in rate calculation, it's better to determine the current scrape timestamp once per pod and use it consistently. Currently, util.GetCurrentTimestamp() is called inside processPrometheusString for each metric, and again when creating HistogramInfo. This can introduce minor inaccuracies because the timestamp used for the rate calculation (now) will be slightly different from the timestamp stored for the next cycle (ScrapeTimestamp).
To improve this, you can get the timestamp once before processing the metrics and use it in both places. This ensures the elapsed_seconds for the rate calculation is based on the exact interval between the stored scrape timestamps.
Example of the proposed change:
// In fetchMetricsFromPods:
...
result := string(bodyStr)
now := util.GetCurrentTimestamp()
collector.processPrometheusString(result, pastHistogramMap, pastCounterMap, currentHistogramMap, currentCounterMap, pastScrapeTimestamp, now, instanceInfo.MetricsMap)
(*currentHistograms)[pod.Name] = HistogramInfo{
PodStartTime: pod.Status.StartTime,
HistogramMap: currentHistogramMap,
CounterMap: currentCounterMap,
ScrapeTimestamp: now,
}
...
// And update processPrometheusString signature and body:
func (c *MetricCollector) processPrometheusString(..., pastScrapeTimestamp int64, now int64, instanceMetricMap algorithm.Metrics) {
// ...
// inside case ..._COUNTER:
// remove: now := util.GetCurrentTimestamp()
// ...
}| func (window *SnapshotSlidingWindow[T]) GetLastUnfreshSnapshotWithTimestamp() (value T, timestamp int64, ok bool) { | ||
| if window.freshMilliseconds == 0 { | ||
| return value, 0, false | ||
| } | ||
| currentTimestamp := window.getCurrentTimestamp() | ||
| window.expire(currentTimestamp) | ||
| if window.pool.Len() == 0 { | ||
| return value, 0, false | ||
| } | ||
| front := window.pool.Front() | ||
| if isFresh(window.freshMilliseconds, currentTimestamp, front.timestamp) { | ||
| return value, 0, false | ||
| } | ||
| return front.value, front.timestamp, true | ||
| } |
There was a problem hiding this comment.
The logic inside this new function is nearly identical to GetLastUnfreshSnapshot. To reduce code duplication and improve maintainability, GetLastUnfreshSnapshot could be refactored to call this new function and discard the returned timestamp. This would make the code more aligned with the DRY (Don't Repeat Yourself) principle.
|
Thanks for the analysis, can you add some test coverage |
|
From another view, it reflects the current api does not fit all. |
|
Hi @hzxuzhonghu, thanks for the review. |
|
@nXtCyberNet I donot have a good suggestion now, but will deep dive into other scalers first |
|
Okay, I'll wait for your response. In the meantime, I'll add the test coverage you requested. Thanks! |
|
@hzxuzhonghu any updates ? |
What type of PR is this?
/kind bug
/kind enhancement
What this PR does / why we need it:
Prometheus counter metrics are monotonically increasing cumulative values.
The autoscaler was treating these raw cumulative values as instantaneous load
signals, causing two critical failures:
Scale-up runaway: A pod that handled 50 total requests since startup would
trigger scaling to 5 replicas (ceil(50 / target)), regardless of current traffic.
No scale-down: Even after traffic stops, the counter value persists at 50,
preventing the autoscaler from ever scaling back down.
The fix: Track per-pod counter snapshots across scrape cycles and compute the
rate of change (delta/elapsed_seconds) instead of the raw cumulative value.
This correctly reflects instantaneous demand and enables both scale-up and scale-down.
Implementation Details:
CounterMapandScrapeTimestampfields toHistogramInfoto maintainper-pod/metric baseline state across scrape cycles.
GetLastUnfreshSnapshotWithTimestamp()toSnapshotSlidingWindowto expose precise per-pod scrape timestamps (more accurate than window-level timestamps).
CounterMapguards protect in-memory snapshotscreated before this change.
Which issue(s) this PR fixes:
Fixes #1037
Special notes for your reviewer:
Why per-pod timestamps? The sliding-window-level timestamp is too coarse;
per-pod scrape times give us the actual elapsed duration for each counter,
improving rate precision when scrape intervals vary.
Counter reset handling: If a pod restarts, its counter resets to 0.
Detecting
current < previousavoids reporting a massive negative rate;returning 0 is safe and gives the pod a grace period to re-accumulate load signals.
Backward compatibility: Any in-memory snapshots from before this change
will have
CounterMap == nil. These are safely handled with a nil-check guard.Testing recommendation: Add bench tests for rate calculation under
varying scrape intervals and counter reset scenarios.
Does this PR introduce a user-facing change?:
Yes. Autoscaling behavior for counter-based metrics will change—scaling will now
respond to instantaneous rate of change rather than cumulative totals, enabling
proper scale-down behavior.