Skip to content

fix: treat prometheus counters as rates in autoscaling signals#1042

Open
nXtCyberNet wants to merge 2 commits into
volcano-sh:mainfrom
nXtCyberNet:issue/counter
Open

fix: treat prometheus counters as rates in autoscaling signals#1042
nXtCyberNet wants to merge 2 commits into
volcano-sh:mainfrom
nXtCyberNet:issue/counter

Conversation

@nXtCyberNet

Copy link
Copy Markdown
Contributor

What type of PR is this?
/kind bug
/kind enhancement
What this PR does / why we need it:

Prometheus counter metrics are monotonically increasing cumulative values.
The autoscaler was treating these raw cumulative values as instantaneous load
signals, causing two critical failures:

  1. Scale-up runaway: A pod that handled 50 total requests since startup would
    trigger scaling to 5 replicas (ceil(50 / target)), regardless of current traffic.

  2. No scale-down: Even after traffic stops, the counter value persists at 50,
    preventing the autoscaler from ever scaling back down.

The fix: Track per-pod counter snapshots across scrape cycles and compute the
rate of change (delta/elapsed_seconds) instead of the raw cumulative value.
This correctly reflects instantaneous demand and enables both scale-up and scale-down.

Implementation Details:

  • Added CounterMap and ScrapeTimestamp fields to HistogramInfo to maintain
    per-pod/metric baseline state across scrape cycles.
  • New rate calculation in metric collection:
rate = (current_value - previous_value) / elapsed_seconds
  • Counter resets (current < previous) detected and handled by clamping rate to 0.
  • First scrape returns rate=0 until baseline is established.
  • Added GetLastUnfreshSnapshotWithTimestamp() to SnapshotSlidingWindow
    to expose precise per-pod scrape timestamps (more accurate than window-level timestamps).
  • Backward compatibility: Nil CounterMap guards protect in-memory snapshots
    created before this change.

Which issue(s) this PR fixes:
Fixes #1037

Special notes for your reviewer:

  • Why per-pod timestamps? The sliding-window-level timestamp is too coarse;
    per-pod scrape times give us the actual elapsed duration for each counter,
    improving rate precision when scrape intervals vary.

  • Counter reset handling: If a pod restarts, its counter resets to 0.
    Detecting current < previous avoids reporting a massive negative rate;
    returning 0 is safe and gives the pod a grace period to re-accumulate load signals.

  • Backward compatibility: Any in-memory snapshots from before this change
    will have CounterMap == nil. These are safely handled with a nil-check guard.

  • Testing recommendation: Add bench tests for rate calculation under
    varying scrape intervals and counter reset scenarios.

Does this PR introduce a user-facing change?:
Yes. Autoscaling behavior for counter-based metrics will change—scaling will now
respond to instantaneous rate of change rather than cumulative totals, enabling
proper scale-down behavior.

Fix autoscaler counter metric handling: Prometheus counter metrics are now 
correctly treated as rates (delta/elapsed_seconds) instead of raw cumulative 
values. This fixes runaway scale-up and enables proper scale-down when traffic stops.

Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Signed-off-by: nXtCyberNet <rohantech2005@gmail.com>
Copilot AI review requested due to automatic review settings May 13, 2026 16:34
@volcano-sh-bot volcano-sh-bot added kind/bug kind/enhancement New feature or request labels May 13, 2026
@volcano-sh-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign git-malu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the metric collector to calculate rates for Prometheus counters by comparing current values with previous snapshots, introducing a new GetLastUnfreshSnapshotWithTimestamp method in the sliding window structure to track scrape intervals. The reviewer suggested improving the precision of rate calculations by using a single consistent timestamp per pod scrape rather than calling the timestamp function multiple times, and recommended refactoring GetLastUnfreshSnapshot to utilize the new method to reduce code duplication.

Comment on lines +188 to 194
collector.processPrometheusString(result, pastHistogramMap, pastCounterMap, currentHistogramMap, currentCounterMap, pastScrapeTimestamp, instanceInfo.MetricsMap)
(*currentHistograms)[pod.Name] = HistogramInfo{
PodStartTime: pod.Status.StartTime,
HistogramMap: currentHistogramMap,
PodStartTime: pod.Status.StartTime,
HistogramMap: currentHistogramMap,
CounterMap: currentCounterMap,
ScrapeTimestamp: util.GetCurrentTimestamp(),
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved precision in rate calculation, it's better to determine the current scrape timestamp once per pod and use it consistently. Currently, util.GetCurrentTimestamp() is called inside processPrometheusString for each metric, and again when creating HistogramInfo. This can introduce minor inaccuracies because the timestamp used for the rate calculation (now) will be slightly different from the timestamp stored for the next cycle (ScrapeTimestamp).

To improve this, you can get the timestamp once before processing the metrics and use it in both places. This ensures the elapsed_seconds for the rate calculation is based on the exact interval between the stored scrape timestamps.

Example of the proposed change:

// In fetchMetricsFromPods:
...
result := string(bodyStr)
now := util.GetCurrentTimestamp()
collector.processPrometheusString(result, pastHistogramMap, pastCounterMap, currentHistogramMap, currentCounterMap, pastScrapeTimestamp, now, instanceInfo.MetricsMap)
(*currentHistograms)[pod.Name] = HistogramInfo{
    PodStartTime:    pod.Status.StartTime,
    HistogramMap:    currentHistogramMap,
    CounterMap:      currentCounterMap,
    ScrapeTimestamp: now,
}
...

// And update processPrometheusString signature and body:
func (c *MetricCollector) processPrometheusString(..., pastScrapeTimestamp int64, now int64, instanceMetricMap algorithm.Metrics) {
    // ...
    // inside case ..._COUNTER:
    // remove: now := util.GetCurrentTimestamp()
    // ...
}

Comment on lines +238 to +252
func (window *SnapshotSlidingWindow[T]) GetLastUnfreshSnapshotWithTimestamp() (value T, timestamp int64, ok bool) {
if window.freshMilliseconds == 0 {
return value, 0, false
}
currentTimestamp := window.getCurrentTimestamp()
window.expire(currentTimestamp)
if window.pool.Len() == 0 {
return value, 0, false
}
front := window.pool.Front()
if isFresh(window.freshMilliseconds, currentTimestamp, front.timestamp) {
return value, 0, false
}
return front.value, front.timestamp, true
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic inside this new function is nearly identical to GetLastUnfreshSnapshot. To reduce code duplication and improve maintainability, GetLastUnfreshSnapshot could be refactored to call this new function and discard the returned timestamp. This would make the code more aligned with the DRY (Don't Repeat Yourself) principle.

@hzxuzhonghu

Copy link
Copy Markdown
Member

Thanks for the analysis, can you add some test coverage

@hzxuzhonghu

Copy link
Copy Markdown
Member

From another view, it reflects the current api does not fit all.

// AutoscalingPolicyMetric defines a metric and its target value for scaling decisions.
type AutoscalingPolicyMetric struct {
	// MetricName defines the name of the metric to monitor for scaling decisions.
	MetricName string `json:"metricName"`
	// TargetValue defines the target value for the metric that triggers scaling operations.
	TargetValue resource.Quantity `json:"targetValue"`
}

@nXtCyberNet

Copy link
Copy Markdown
Contributor Author

Hi @hzxuzhonghu, thanks for the review.
I apologize—I didn't fully consider the design implications. Before I proceed, I want to understand the roadmap better. Do you think adding a MetricType field to the API would be a good long-term solution? If you believe it's worth doing, I'm happy to include it in this PR.
However, if adding MetricType would conflict with the existing design (where histogram is the default), I think the best approach is to close this PR for now. Adding counter support without explicit API-level type declaration could introduce ambiguity and break existing behavior.
What are your thoughts on the right path forward?

@hzxuzhonghu

Copy link
Copy Markdown
Member

@nXtCyberNet I donot have a good suggestion now, but will deep dive into other scalers first

@nXtCyberNet

Copy link
Copy Markdown
Contributor Author

Okay, I'll wait for your response. In the meantime, I'll add the test coverage you requested. Thanks!

@nXtCyberNet

Copy link
Copy Markdown
Contributor Author

@hzxuzhonghu any updates ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Autoscaler treats Prometheus COUNTER metrics as instantaneous values instead of cumulative

4 participants