Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The change from rate() to irate() is a breaking change #670

Open
rmak-cpi opened this issue Sep 13, 2021 · 10 comments
Open

The change from rate() to irate() is a breaking change #670

rmak-cpi opened this issue Sep 13, 2021 · 10 comments
Assignees
Labels
keepalive Use to prevent automatic closing

Comments

@rmak-cpi
Copy link

Submitting an issue so folks encountering the same problem as me can have an easier time finding out what happened. I recently upgraded to kube-prometheus-stack and discovered that CPU data no longer shows on old Grafana Kubernetes / Compute Resources / Workload dashboard that was hosted somewhere else. It turns out that the following change from rate() to irate() was the cause of the issue:

e996e00

In particular, the renaming from node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate to node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate can break any dashboard/rules etc referencing the old name. As a (temporary) workaround, I think it's possible for me to just recreate the old node_namespace_pod_container:container_cpu_usage_seconds_total:sum_rate recording rule.

Please feel free to add brief comments confirming the observation and close it. Thanks!

@tahajahangir
Copy link

tahajahangir commented Oct 1, 2021

Using irate also does not result in better details. It only makes graphs more random and more noisy (using only sample of points) and final graph may completely discard spikes (while rate always show average rate over the interval).
https://valyala.medium.com/why-irate-from-prometheus-doesnt-capture-spikes-45f9896d7832

Original bad PR: #619

@paulfantom
Copy link
Member

while rate always show average rate over the interval

Average removes spikes and thus removes important data. In a lot of cases you want those spikes to be present on graphs. This is in contrast to alerts where you probably don't want to have spikes and rate (or other statistic methods) is a better choice.

More in https://www.robustperception.io/irate-graphs-are-better-graphs

@tahajahangir
Copy link

tahajahangir commented Oct 12, 2021

Average removes spikes and thus removes important data.

rate does not remove spikes, it makes them flat over a time period. In contrast irate may completely remove them (as they are not happened at all).

irate will only use the last two points of data in a time range (ignoring all other points), and will result in more noisy graphs. rate will consider first and last data point.

@paulfantom
Copy link
Member

rate does not remove spikes, it makes them flat over a time period

When it comes to graphs "removal" and "making spikes flat" are the same. In both cases, you are losing data from visualization.

@bboreham
Copy link

It's unpredictable whether the important spikes are at the end of the window, in which case irate can see them, or earlier in the window, in which case irate will ignore them.
I prefer consistent behaviour, and no discarding of points, so rate() is better.

Copy link

This issue has not had any activity in the past 30 days, so the
stale label has been added to it.

  • The stale label will be removed if there is new activity
  • The issue will be closed in 7 days if there is no new activity
  • Add the keepalive label to exempt this issue from the stale check action

Thank you for your contributions!

@github-actions github-actions bot added the stale label Oct 15, 2024
@skl
Copy link
Collaborator

skl commented Oct 16, 2024

Duplicate of #679

@skl skl marked this as a duplicate of #679 Oct 16, 2024
@skl skl closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2024
@bboreham
Copy link

This one is about the name changing, while #679 is about the behaviour.

@skl
Copy link
Collaborator

skl commented Oct 16, 2024

@bboreham ah ok, I was thinking about contributing a rate version of all the existing irate recording rules - to maintain backwards compatibility and give users the choice. I thought that might address both tickets, hence marking as dupe. wdyt?

@bboreham
Copy link

Duplicating the recording rules would indeed address the breaking change, relating to other uses of the data.
To fix #679 would require changing the dashboards.

@skl skl reopened this Oct 17, 2024
@skl skl added keepalive Use to prevent automatic closing and removed stale labels Oct 17, 2024
@skl skl self-assigned this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Use to prevent automatic closing
Projects
None yet
Development

No branches or pull requests

5 participants