Skip to content
This repository was archived by the owner on Apr 2, 2024. It is now read-only.

Aggregates for downsampling caggs #602

Draft
wants to merge 17 commits into
base: feature_metric_rollup
Choose a base branch
from
Draft

Conversation

jgpruitt
Copy link
Collaborator

@jgpruitt jgpruitt commented Jan 3, 2023

Description

  • counter_reset_sum
  • irate

Merge requirements

Please take into account the following non-code changes that you may need to make with your PR:

  • CHANGELOG entry for user-facing changes
  • Updated the relevant documentation

Harkishen-Singh and others added 12 commits December 15, 2022 17:08
Signed-off-by: Harkishen-Singh <[email protected]>

This commit does 3 things on a high level.
1. create_metric_rollup is a straightforward that creates schema based on rollup name, prefixed with 'ps_'.

2. Functions to create metric rollup depending on metric type. This is done by
a function scan_for_new_rollups that runs every 30 mins. This function looks for
pending metrics (by peeking inside metric_with_rollup table) that need rollups
and creates them. The creation path is
scan_for_new_rollups -> create_metric_rollup_view (decides the metric type) -> create_rollup_for_{gauge|counter|summary}.
Counter and Histogram have similar columns, hence handled by the same function.

3. Adds set of utility functions like counter_reset_sum and irate, which are required for rollup creation. These
are temporary functions and they MUST be removed before the MVP.

Note: These functions are just dummies, their behaviour is not 100% correct.
Example: These functions require ordered values array, but we cannot do ORDER BY in Caggs query.
Hence, these functions will be handled by SQL aggregate written in Rust.
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]>

This commit does 2 things:
1. Update the prom_api.register_metric_view() to accept refresh_interval information
2. Creates a Cagg refresher job that is per resolution. This job is created by `create_cagg_refresh_job`
after confirming no job related to the given resolution exists for refresh

Note:
1. We need to register the newly created metric-rollups using register_metric_view(). This is done in
the `scan_for_new_rollups` after storing the new Caggs metadata in a temp table. This is to register the Caggs
only after all views are created, otherwise we get a Error while creating materialized view, saying the Caggs
already exists.

2. In register_metric_view() table, we skip the checks offered by `get_first_level_view_on_metric()` since based
on a quick investigation, these checks do not work when the Caggs are created with `timescaledb.materialized=true`. Moreover,
since we are creating Caggs internally in metric-rollups, we need not worry about Cagg view created (which is done by those checks).
Signed-off-by: Harkishen-Singh <[email protected]>

This commit does 2 things: handle compression and retention for metric-rollups and
custom Caggs for downsampling. The comments on `How?` are present as comments
on the respective functions.
…ompressing Caggs.

Signed-off-by: Harkishen-Singh <[email protected]>

This PR does 2 things:

1. Fixes the exception Caggs already exists that was caused when a cagg for one
resolution was again created when the `scan_for_new_rollups()` was re-executed. Added a
test under `metric_rollups_scan_for_new_rollups` to ensure correct behaviour.

2. `execute_caggs_compression_policy()` used to panic when looping over already compressed
chunk of a Cagg that has some uncompressed chunks that need compression. Added rerunning
compression policy to ensure correct behaviour.
@jgpruitt jgpruitt self-assigned this Jan 3, 2023
@jgpruitt jgpruitt requested a review from a team as a code owner January 3, 2023 22:11
@jgpruitt jgpruitt marked this pull request as draft January 3, 2023 22:12
}
}

#[allow(clippy::too_many_arguments)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 0.3.x pgx has a native support for aggregates https://github.com/tcdi/pgx/blob/master/pgx-examples/aggregate/src/lib.rs and we ourselves had wanted to move to use it #62

This is not a deal breaker, but it would be my preference if we didn't go the legacy path for newly introduced aggregates.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. Cool. I didn't realize that.

#[derive(Serialize, Deserialize, PostgresType, Debug)]
#[pgx(sql = false)]
pub struct CounterResetState {
prior: (i64, f64),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be a structure with named fields

@Harkishen-Singh Harkishen-Singh force-pushed the feature_metric_rollup branch 2 times, most recently from 6c3abd9 to ca87ecd Compare January 10, 2023 11:01
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants