This repository was archived by the owner on Apr 2, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 18
WIP: Adds support for Prometheus metric-rollups #594
Draft
Harkishen-Singh
wants to merge
17
commits into
master
Choose a base branch
from
feature_metric_rollup
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
9465a32
to
03ef564
Compare
e3a65bb
to
c92646a
Compare
Signed-off-by: Harkishen-Singh <[email protected]> This commit does 3 things on a high level. 1. create_metric_rollup is a straightforward that creates schema based on rollup name, prefixed with 'ps_'. 2. Functions to create metric rollup depending on metric type. This is done by a function scan_for_new_rollups that runs every 30 mins. This function looks for pending metrics (by peeking inside metric_with_rollup table) that need rollups and creates them. The creation path is scan_for_new_rollups -> create_metric_rollup_view (decides the metric type) -> create_rollup_for_{gauge|counter|summary}. Counter and Histogram have similar columns, hence handled by the same function. 3. Adds set of utility functions like counter_reset_sum and irate, which are required for rollup creation. These are temporary functions and they MUST be removed before the MVP. Note: These functions are just dummies, their behaviour is not 100% correct. Example: These functions require ordered values array, but we cannot do ORDER BY in Caggs query. Hence, these functions will be handled by SQL aggregate written in Rust.
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]> This commit does 2 things: 1. Update the prom_api.register_metric_view() to accept refresh_interval information 2. Creates a Cagg refresher job that is per resolution. This job is created by `create_cagg_refresh_job` after confirming no job related to the given resolution exists for refresh Note: 1. We need to register the newly created metric-rollups using register_metric_view(). This is done in the `scan_for_new_rollups` after storing the new Caggs metadata in a temp table. This is to register the Caggs only after all views are created, otherwise we get a Error while creating materialized view, saying the Caggs already exists. 2. In register_metric_view() table, we skip the checks offered by `get_first_level_view_on_metric()` since based on a quick investigation, these checks do not work when the Caggs are created with `timescaledb.materialized=true`. Moreover, since we are creating Caggs internally in metric-rollups, we need not worry about Cagg view created (which is done by those checks).
Signed-off-by: Harkishen-Singh <[email protected]> This commit does 2 things: handle compression and retention for metric-rollups and custom Caggs for downsampling. The comments on `How?` are present as comments on the respective functions.
…ple. Signed-off-by: Harkishen-Singh <[email protected]>
…ompressing Caggs. Signed-off-by: Harkishen-Singh <[email protected]> This PR does 2 things: 1. Fixes the exception Caggs already exists that was caused when a cagg for one resolution was again created when the `scan_for_new_rollups()` was re-executed. Added a test under `metric_rollups_scan_for_new_rollups` to ensure correct behaviour. 2. `execute_caggs_compression_policy()` used to panic when looping over already compressed chunk of a Cagg that has some uncompressed chunks that need compression. Added rerunning compression policy to ensure correct behaviour.
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]> This commit increases the refresh range for a Cagg. This is important since earlier, we used to account refresh only for 1 bucket. This led to data loss in systems with high metircs/series since the refresh took a lot of time to surpass the refresh range. This commit addresses this issue by adding a _safety_refresh_start_buffer that allows to start refresh from an earlier point for a bigger refresh range in small downsampling scenarios like 5 minute, thereby preventing sample loss. This is usually not a problem for long downsampling scenarios like 1 hour.
Signed-off-by: Harkishen-Singh <[email protected]>
Signed-off-by: Harkishen-Singh <[email protected]>
c92646a
to
6c3abd9
Compare
Signed-off-by: Harkishen-Singh <[email protected]>
6c3abd9
to
ca87ecd
Compare
Signed-off-by: Harkishen-Singh <[email protected]>
Harkishen-Singh
commented
Jan 11, 2023
@@ -2562,7 +2564,7 @@ LANGUAGE PLPGSQL; | |||
REVOKE ALL ON FUNCTION _prom_catalog.create_metric_view(text) FROM PUBLIC; | |||
GRANT EXECUTE ON FUNCTION _prom_catalog.create_metric_view(text) TO prom_writer; | |||
|
|||
CREATE OR REPLACE FUNCTION prom_api.register_metric_view(schema_name text, view_name text, if_not_exists BOOLEAN = false) | |||
CREATE OR REPLACE FUNCTION prom_api.register_metric_view(schema_name text, view_name text, refresh_interval INTERVAL, if_not_exists BOOLEAN = false, downsample_id BIGINT = NULL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to self: Documentation changes referring to refresh_interval
.
_active_chunk_interval INTERVAL; | ||
_ignore_downsampling_clause TEXT := ''; | ||
_refresh_downsampled_data BOOLEAN := (SELECT prom_api.get_global_downsampling_state()::BOOLEAN); | ||
_should_refresh_clause TEXT := 'AND (SELECT should_refresh FROM _prom_catalog.downsample WHERE id = m.downsample_id) = TRUE'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should here be a coalesce()
otherwise its a bug, since its not gonna update for custom Caggs when _refresh_downsampled_data
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not, since for custom Caggs, downsampling_id
will be NULL
, so this clause has no effect?
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
TODO:
irate()
andcounter_reset_sum()
SQL aggregateMerge requirements
Please take into account the following non-code changes that you may need to make with your PR: