WIP: Adds support for Prometheus metric-rollups #594

Harkishen-Singh · 2022-12-16T09:45:57Z

Description

TODO:

irate() and counter_reset_sum() SQL aggregate

Merge requirements

Please take into account the following non-code changes that you may need to make with your PR:

CHANGELOG entry for user-facing changes
Updated the relevant documentation

Signed-off-by: Harkishen-Singh <[email protected]> This commit does 3 things on a high level. 1. create_metric_rollup is a straightforward that creates schema based on rollup name, prefixed with 'ps_'. 2. Functions to create metric rollup depending on metric type. This is done by a function scan_for_new_rollups that runs every 30 mins. This function looks for pending metrics (by peeking inside metric_with_rollup table) that need rollups and creates them. The creation path is scan_for_new_rollups -> create_metric_rollup_view (decides the metric type) -> create_rollup_for_{gauge|counter|summary}. Counter and Histogram have similar columns, hence handled by the same function. 3. Adds set of utility functions like counter_reset_sum and irate, which are required for rollup creation. These are temporary functions and they MUST be removed before the MVP. Note: These functions are just dummies, their behaviour is not 100% correct. Example: These functions require ordered values array, but we cannot do ORDER BY in Caggs query. Hence, these functions will be handled by SQL aggregate written in Rust.

Signed-off-by: Harkishen-Singh <[email protected]>

Signed-off-by: Harkishen-Singh <[email protected]> This commit does 2 things: 1. Update the prom_api.register_metric_view() to accept refresh_interval information 2. Creates a Cagg refresher job that is per resolution. This job is created by `create_cagg_refresh_job` after confirming no job related to the given resolution exists for refresh Note: 1. We need to register the newly created metric-rollups using register_metric_view(). This is done in the `scan_for_new_rollups` after storing the new Caggs metadata in a temp table. This is to register the Caggs only after all views are created, otherwise we get a Error while creating materialized view, saying the Caggs already exists. 2. In register_metric_view() table, we skip the checks offered by `get_first_level_view_on_metric()` since based on a quick investigation, these checks do not work when the Caggs are created with `timescaledb.materialized=true`. Moreover, since we are creating Caggs internally in metric-rollups, we need not worry about Cagg view created (which is done by those checks).

Signed-off-by: Harkishen-Singh <[email protected]> This commit does 2 things: handle compression and retention for metric-rollups and custom Caggs for downsampling. The comments on `How?` are present as comments on the respective functions.

…ple. Signed-off-by: Harkishen-Singh <[email protected]>

…ompressing Caggs. Signed-off-by: Harkishen-Singh <[email protected]> This PR does 2 things: 1. Fixes the exception Caggs already exists that was caused when a cagg for one resolution was again created when the `scan_for_new_rollups()` was re-executed. Added a test under `metric_rollups_scan_for_new_rollups` to ensure correct behaviour. 2. `execute_caggs_compression_policy()` used to panic when looping over already compressed chunk of a Cagg that has some uncompressed chunks that need compression. Added rerunning compression policy to ensure correct behaviour.

Signed-off-by: Harkishen-Singh <[email protected]>

Signed-off-by: Harkishen-Singh <[email protected]> This commit increases the refresh range for a Cagg. This is important since earlier, we used to account refresh only for 1 bucket. This led to data loss in systems with high metircs/series since the refresh took a lot of time to surpass the refresh range. This commit addresses this issue by adding a _safety_refresh_start_buffer that allows to start refresh from an earlier point for a bigger refresh range in small downsampling scenarios like 5 minute, thereby preventing sample loss. This is usually not a problem for long downsampling scenarios like 1 hour.

Signed-off-by: Harkishen-Singh <[email protected]>

Harkishen-Singh · 2023-01-11T09:08:26Z

migration/idempotent/001-base.sql

@@ -2562,7 +2564,7 @@ LANGUAGE PLPGSQL;
 REVOKE ALL ON FUNCTION _prom_catalog.create_metric_view(text) FROM PUBLIC;
 GRANT EXECUTE ON FUNCTION _prom_catalog.create_metric_view(text) TO prom_writer;

-CREATE OR REPLACE FUNCTION prom_api.register_metric_view(schema_name text, view_name text, if_not_exists BOOLEAN = false)
+CREATE OR REPLACE FUNCTION prom_api.register_metric_view(schema_name text, view_name text, refresh_interval INTERVAL, if_not_exists BOOLEAN = false, downsample_id BIGINT = NULL)


Note to self: Documentation changes referring to refresh_interval.

Harkishen-Singh · 2023-01-11T09:46:44Z

migration/idempotent/018-metric-downsampling-maintenance.sql

+    _active_chunk_interval INTERVAL;
+    _ignore_downsampling_clause TEXT := '';
+    _refresh_downsampled_data BOOLEAN := (SELECT prom_api.get_global_downsampling_state()::BOOLEAN);
+    _should_refresh_clause TEXT := 'AND (SELECT should_refresh FROM _prom_catalog.downsample WHERE id = m.downsample_id) = TRUE';


Should here be a coalesce() otherwise its a bug, since its not gonna update for custom Caggs when _refresh_downsampled_data is true.

Maybe not, since for custom Caggs, downsampling_id will be NULL, so this clause has no effect?

Harkishen-Singh added the epic/automatic-downsampling label Dec 16, 2022

Harkishen-Singh self-assigned this Dec 16, 2022

Harkishen-Singh force-pushed the feature_metric_rollup branch from 9465a32 to 03ef564 Compare December 16, 2022 12:11

Harkishen-Singh force-pushed the feature_metric_rollup branch from e3a65bb to c92646a Compare January 10, 2023 10:20

Harkishen-Singh added 15 commits January 10, 2023 16:08

Add getter and setter functions for downsample config.

b4e5b50

Signed-off-by: Harkishen-Singh <[email protected]>

Add tests for creation/deletion of metric-rollups.

d952a09

Signed-off-by: Harkishen-Singh <[email protected]>

Fix pgspot issues.

8fe2853

Signed-off-by: Harkishen-Singh <[email protected]>

Add E2E tests for maintenance work in metric-rollups & custom downsam…

7147156

…ple. Signed-off-by: Harkishen-Singh <[email protected]>

Resolve deadlock on refreshing rollups while ingesting samples.

a7cfe13

Signed-off-by: Harkishen-Singh <[email protected]>

Refactor scan_for_new_rollups to work in non-gracefull shutdown.

86a6d6f

Signed-off-by: Harkishen-Singh <[email protected]>

Perf: Avoid using most recent buckets when refreshing rollups.

13332d8

Signed-off-by: Harkishen-Singh <[email protected]>

Adds telemetry for metric-rollup

15dba56

Signed-off-by: Harkishen-Singh <[email protected]>

Update SQL objects in accordance with new downsampling UX.

aa98dfd

Signed-off-by: Harkishen-Singh <[email protected]>

Add querying view & support for dowmsampling metrics with no metadata.

633a6fd

Signed-off-by: Harkishen-Singh <[email protected]>

Harkishen-Singh force-pushed the feature_metric_rollup branch from c92646a to 6c3abd9 Compare January 10, 2023 10:53

Implement apply_downsample_config and add E2E tests.

ca87ecd

Signed-off-by: Harkishen-Singh <[email protected]>

Harkishen-Singh force-pushed the feature_metric_rollup branch from 6c3abd9 to ca87ecd Compare January 10, 2023 11:01

squashable: Proof reading 1.

647cb30

Signed-off-by: Harkishen-Singh <[email protected]>

Harkishen-Singh commented Jan 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Adds support for Prometheus metric-rollups #594

WIP: Adds support for Prometheus metric-rollups #594

Harkishen-Singh commented Dec 16, 2022

Harkishen-Singh Jan 11, 2023

Harkishen-Singh Jan 11, 2023

Harkishen-Singh Jan 11, 2023

WIP: Adds support for Prometheus metric-rollups #594

Are you sure you want to change the base?

WIP: Adds support for Prometheus metric-rollups #594

Conversation

Harkishen-Singh commented Dec 16, 2022

Description

Merge requirements

Harkishen-Singh Jan 11, 2023

Choose a reason for hiding this comment

Harkishen-Singh Jan 11, 2023

Choose a reason for hiding this comment

Harkishen-Singh Jan 11, 2023

Choose a reason for hiding this comment