You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CREATE TABLE email_status (
id BIGINT,
ts TIMESTAMPTZ,
status TEXT
);
SELECT create_hypertable('email_status','ts');
INSERT INTO email_status("ts", "id", "status")
VALUES
('2023-10-24 12:01:00',1,'draft');
SELECT time_bucket('5 minutes'::interval, ts) AS bucket,
id,
duration_in(state_agg(ts, status) , 'draft')
FROM email_status
GROUP BY bucket, id;
There's no state transition here so the result is also:
a) state transitions don't happen within one bucket (see #652) b) state transitions end sometime how do we calculate the duration in relation to the bucket time/origin?
Now if we query again the saved state:
SELECT time_bucket('5 minutes'::interval, ts) AS bucket,
id,
duration_in(state_agg(ts, status) , 'saved')
FROM email_status
GROUP BY bucket, id;
Surely we can reasonably expect the duration to be the remaining time on the bucket i.e. 3 minutes when it was inserted at 2 minutes as that's the duration of that state within that single bucket (assuming we don't do the query in the middle of the bucket but 5 minutes later).
Basically I think there should be a flag or even the default behavior for durations where the end_time is NULL and where we can return the bucket_width - (start_time - bucket_time)
or now() - start_time if it's a ongoing real-time bucket.
How have I tried to gapfill this or solve this?
I tried time_bucket_gapfill with locf carrying a status to following buckets, this is done in 1s or 1ms buckets which can be then combined with a 5 minute time_bucket. This works but for starters is very slow and unpractical for any larger time-windows.
SELECT "bucket",
"uuid",
duration_in("statusAgg", 'ACTIVE') AS "activeTime"
FROM
(SELECT time_bucket(INTERVAL '5 minutes', "time") AS "bucket",
"uuid",
state_agg("time", cast_to_text("status")) AS "statusAgg"
FROM
(select time_bucket_gapfill(INTERVAL '1 s', "time") AS "time",
"uuid",
locf(last("status", "time")) as "status"
FROM "StatusHistory"
WHERE "time" >= NOW() - INTERVAL '12 hours' AND "time" <= NOW()
AND "uuid"::text = '83e5858c-202c-43d0-a35e-00c28ee87152'
GROUP by 1, 2) sub
GROUP BY 1, 2) ss
ORDER BY "bucket" DESC
One issue here is that continuous aggregates don't support time_bucket_gapfill so this can't be aggregated as far as I can tell. If there's any way to improve this, please let me know. Secondly this one refreshes only the 12 hours so if the final state change was even before that duration will be wrong.
Our first plan was to simply do hierarchical caggs like:
CREATE MATERIALIZED VIEW "HistoryAgg5Minute"
WITH (timescaledb.continuous) AS
SELECT time_bucket(INTERVAL '5 minutes', "time") AS "bucket",
uuid,
state_agg("time", cast_to_text("status")) AS "statusAgg"
FROM "StatusHistory"
GROUP BY 1, 2
WITH NO DATA;
and stack it with 1hour, 1 week aggregates etc. And we expected this would as simple as that and then use duration_in to get how long each status is within in each bucket - but of course the issue is partly also that there's no bucket for all time spans either. That could've been tolerated and programmatically fill in the blanks perhaps (although hard too) if at least the bucket in which state final state transition happens would reflect the state correctly.
I'm not the only one perplexed with this behavior:
Sum up:
To quote the docs from compact_state_agg which I believe should function similarly.
For example, you can use the compact_state_agg functions to track how much time a system spends in error, running, or starting states.
If the system reports once that it's running the aggregation reports that the duration it's been running is 0 if there's no transition after this state. I don't think it should work like this and this is more of a bug that the ending isn't handled.
The text was updated successfully, but these errors were encountered:
EDIT: Ok figured it out. We can gapfill manually with a add_job that creates every bucket on the history table for every item every beginning of a new bucket - wasteful/inefficient as can't just gapfill on the aggregate itself. Then duration_in (or perhaps interpolated but we can just sum) can be used to calculate the ending duration up to now() or the end of the bucket. Wastes a lot of space!
What would be ideal query:
CREATE MATERIALIZED VIEW "HistoryAgg5Minute"
WITH (timescaledb.continuous) AS
SELECT time_bucket_gapfill(INTERVAL '5 minutes', "time") AS "bucket",
uuid,
state_agg("time", locf("status")) AS "statusAgg"
FROM "StatusHistory"
GROUP BY 1, 2
WITH NO DATA;
If gapfill supported caggs + locf could be nested like this inside state_agg. I think currently locf doesn't support it and not sure if there could be another aggregate that would aggregate locf(status).
So perhaps this PR isn't valid as interpolated_duration_in does what I described mostly and this was mostly about experience of using these. Feel free to close.
r#"INSERT INTO test VALUES
('2020-01-01 00:00:00+00', 'one'),
('2020-01-01 00:02:00+00', 'one'),
('2020-01-01 00:01:00+00', 'two'),
('2020-12-31 00:02:00+00', 'end')
where in the real world there is state end? Compact_state_agg is a good idea and also would be nice to have rollup(): StateAgg -> CompactStateAgg as a way to compact hierarchical aggs.
I may misunderstand the state_agg as it seems to calculate only deltas i.e.
state transitions
but it never considers the final state.To demonstrate a quick test from the examples.
There's no state transition here so the result is also:
If we add the first "state transition" to another state 2 minutes later.
We get:
Now the issue is when
a) state transitions don't happen within one bucket (see #652)
b) state transitions end sometime how do we calculate the duration in relation to the bucket time/origin?
Now if we query again the
saved
state:Surely we can reasonably expect the duration to be the remaining time on the bucket i.e. 3 minutes when it was inserted at 2 minutes as that's the duration of that state within that single bucket (assuming we don't do the query in the middle of the bucket but 5 minutes later).
But this is what we get back to zero:
Basically I think there should be a flag or even the default behavior for durations where the
end_time
is NULL and where we can return thebucket_width - (start_time - bucket_time)
or
now() - start_time
if it's a ongoing real-time bucket.How have I tried to gapfill this or solve this?
I tried
time_bucket_gapfill
withlocf
carrying a status to following buckets, this is done in1s
or1ms
buckets which can be then combined with a 5 minutetime_bucket
. This works but for starters is very slow and unpractical for any larger time-windows.One issue here is that continuous aggregates don't support time_bucket_gapfill so this can't be aggregated as far as I can tell. If there's any way to improve this, please let me know. Secondly this one refreshes only the 12 hours so if the final state change was even before that duration will be wrong.
Our first plan was to simply do hierarchical caggs like:
and stack it with 1hour, 1 week aggregates etc. And we expected this would as simple as that and then use
duration_in
to get how long each status is within in each bucket - but of course the issue is partly also that there's no bucket for all time spans either. That could've been tolerated and programmatically fill in the blanks perhaps (although hard too) if at least the bucket in which state final state transition happens would reflect the state correctly.I'm not the only one perplexed with this behavior:
#652
#479
https://www.timescale.com/forum/t/trouble-getting-interpolated-duration-in-to-show-expected-results/2059
https://www.timescale.com/forum/t/odd-results-with-state-aggregation-duration-in-across-time-bucket-boundaries/2156
Sum up:
To quote the docs from
compact_state_agg
which I believe should function similarly.If the system reports once that it's
running
the aggregation reports that the duration it's been running is 0 if there's no transition after this state. I don't think it should work like this and this is more of a bug that the ending isn't handled.The text was updated successfully, but these errors were encountered: