Skip to content
This repository was archived by the owner on Apr 2, 2024. It is now read-only.

WIP: Adds support for Prometheus metric-rollups #594

Draft
wants to merge 17 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
678f36e
Adds support for creating metric rollups based on metric types.
Harkishen-Singh Oct 17, 2022
b4e5b50
Add getter and setter functions for downsample config.
Harkishen-Singh Oct 18, 2022
d952a09
Add tests for creation/deletion of metric-rollups.
Harkishen-Singh Oct 28, 2022
8fe2853
Fix pgspot issues.
Harkishen-Singh Nov 1, 2022
73596cd
Add support for refreshing rollups and custom Caggs.
Harkishen-Singh Nov 3, 2022
fc3569e
Add support to compress and retain metric-rollups and custom Caggs.
Harkishen-Singh Nov 4, 2022
7147156
Add E2E tests for maintenance work in metric-rollups & custom downsam…
Harkishen-Singh Nov 7, 2022
eb6bf16
Fix: Exception on scan_for_new_rollups() for new resolutions & when c…
Harkishen-Singh Nov 17, 2022
a7cfe13
Resolve deadlock on refreshing rollups while ingesting samples.
Harkishen-Singh Nov 22, 2022
86a6d6f
Refactor scan_for_new_rollups to work in non-gracefull shutdown.
Harkishen-Singh Nov 22, 2022
13332d8
Perf: Avoid using most recent buckets when refreshing rollups.
Harkishen-Singh Dec 5, 2022
15dba56
Adds telemetry for metric-rollup
Harkishen-Singh Dec 16, 2022
fbfb943
Improvements to refreshing Caggs to avoid sample loss.
Harkishen-Singh Dec 22, 2022
aa98dfd
Update SQL objects in accordance with new downsampling UX.
Harkishen-Singh Dec 19, 2022
633a6fd
Add querying view & support for dowmsampling metrics with no metadata.
Harkishen-Singh Jan 2, 2023
ca87ecd
Implement apply_downsample_config and add E2E tests.
Harkishen-Singh Jan 6, 2023
647cb30
squashable: Proof reading 1.
Harkishen-Singh Jan 11, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ We use the following categories for changes:

## [Unreleased]

### Added
- Support to create and delete metric downsampling based on metric types [#538]
- Support for maintenance of metric downsampling [#554]

## [0.8.0 - 2023-01-05]

### Changed
Expand All @@ -38,8 +42,7 @@ We use the following categories for changes:
- Added `_ps_catalog.compressed_chunks_missing_stats` view [#595]

### Fixed
- Column conflict when creating a metric view with a label called `series`
[#559]
- Column conflict when creating a metric view with a label called `series` [#559]

## [0.7.0 - 2022-10-03]

Expand Down
108 changes: 107 additions & 1 deletion docs/sql-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,16 @@ get the default retention period for all metrics
```
function interval **prom_api.get_default_metric_retention_period**()
```
### prom_api.get_downsample_old_data

```
function boolean **prom_api.get_downsample_old_data**()
```
### prom_api.get_global_downsampling_state
Get automatic downsample state
```
function boolean **prom_api.get_global_downsampling_state**()
```
### prom_api.get_metric_chunk_interval
Get the chunk interval for a specific metric, or the default chunk interval if not explicitly set
```
Expand Down Expand Up @@ -135,7 +145,7 @@ function void **prom_api.promscale_post_restore**()
### prom_api.register_metric_view

```
function boolean **prom_api.register_metric_view**(schema_name text, view_name text, if_not_exists boolean DEFAULT false)
function boolean **prom_api.register_metric_view**(schema_name text, view_name text, refresh_interval interval, if_not_exists boolean DEFAULT false, downsample_id bigint DEFAULT NULL::bigint)
```
### prom_api.reset_metric_chunk_interval
resets the chunk interval for a specific metric to using the default
Expand Down Expand Up @@ -177,6 +187,16 @@ set the retention period for any metrics (existing and new) without an explicit
```
function boolean **prom_api.set_default_retention_period**(retention_period interval)
```
### prom_api.set_downsample_old_data

```
function void **prom_api.set_downsample_old_data**(_state boolean)
```
### prom_api.set_global_downsampling_state
Set automatic-downsampling state for metrics. Downsampled data will be created only if this returns true
```
function void **prom_api.set_global_downsampling_state**(_state boolean)
```
### prom_api.set_metric_chunk_interval
set a chunk interval for a specific metric (this overrides the default)
```
Expand Down Expand Up @@ -492,11 +512,21 @@ the specified span_id.
```
function TABLE(trace_id trace_id, parent_span_id bigint, span_id bigint, dist integer, path bigint[]) **ps_trace.upstream_spans**(_trace_id trace_id, _span_id bigint, _max_dist integer DEFAULT NULL::integer)
```
### _prom_catalog.add_compression_clause_to_downsample_view

```
procedure void **_prom_catalog.add_compression_clause_to_downsample_view**(IN _schema text, IN _table_name text)
```
### _prom_catalog.add_job
A wrapper around public.add_job that introduces jitter to job start times and schedules.
```
function integer **_prom_catalog.add_job**(proc regproc, schedule_interval interval, config jsonb DEFAULT NULL::jsonb)
```
### _prom_catalog.apply_downsample_config

```
function void **_prom_catalog.apply_downsample_config**(config jsonb)
```
### _prom_catalog.attach_series_partition

```
Expand All @@ -522,6 +552,22 @@ procedure void **_prom_catalog.compress_old_chunks**(IN _hypertable_schema_name
```
function integer **_prom_catalog.count_jsonb_keys**(j jsonb)
```
### _prom_catalog.counter_reset_sum

```
function double precision **_prom_catalog.counter_reset_sum**(v double precision[])
```
### _prom_catalog.create_cagg_refresh_job_if_not_exists
Creates a Cagg refresh job that refreshes all Caggs registered by register_metric_view().
This function creates a refresh job only if no execute_caggs_refresh_policy() exists currently with the given refresh_interval.
```
function void **_prom_catalog.create_cagg_refresh_job_if_not_exists**(_refresh_interval interval)
```
### _prom_catalog.create_default_downsampling_query_view

```
function void **_prom_catalog.create_default_downsampling_query_view**(_schema text, _table_name text, _default_column text)
```
### _prom_catalog.create_exemplar_table_if_not_exists

```
Expand All @@ -542,6 +588,11 @@ function text **_prom_catalog.create_ingest_temp_table**(table_name text, schema
```
function record **_prom_catalog.create_label_key**(new_key text, OUT id integer, OUT value_column_name name, OUT id_column_name name)
```
### _prom_catalog.create_metric_downsampling_view

```
function boolean **_prom_catalog.create_metric_downsampling_view**(_schema text, _metric_name text, _table_name text, _interval interval)
```
### _prom_catalog.create_metric_table

```
Expand Down Expand Up @@ -577,6 +628,11 @@ procedure void **_prom_catalog.decompress_chunks_after**(IN metric_table text, I
```
function void **_prom_catalog.delay_compression_job**(ht_table text, new_start timestamp with time zone)
```
### _prom_catalog.delete_downsampling

```
procedure void **_prom_catalog.delete_downsampling**(IN _schema_name text)
```
### _prom_catalog.delete_expired_series

```
Expand All @@ -597,6 +653,21 @@ function bigint **_prom_catalog.delete_series_from_metric**(name text, series_id
```
procedure void **_prom_catalog.do_decompress_chunks_after**(IN metric_table text, IN min_time timestamp with time zone, IN transactional boolean DEFAULT false)
```
### _prom_catalog.downsample_counter

```
procedure void **_prom_catalog.downsample_counter**(IN _schema text, IN _table_name text, IN _interval interval)
```
### _prom_catalog.downsample_gauge

```
procedure void **_prom_catalog.downsample_gauge**(IN _schema text, IN _table_name text, IN _interval interval)
```
### _prom_catalog.downsample_summary

```
procedure void **_prom_catalog.downsample_summary**(IN _schema text, IN _table_name text, IN _interval interval)
```
### _prom_catalog.drop_metric_chunk_data
drop chunks from schema_name.metric_name containing data older than older_than.
```
Expand All @@ -612,6 +683,31 @@ ABORT an INSERT transaction due to the ID epoch being out of date
```
function void **_prom_catalog.epoch_abort**(user_epoch bigint)
```
### _prom_catalog.execute_caggs_compression_policy
execute_caggs_compression_policy is responsible to compress Caggs registered via
register_metric_view() in _prom_catalog.metric. It goes through all the entries in the _prom_catalog.metric and tries to compress any Cagg that supports compression.
These include automatic-downsampling of metrics and custom Caggs based downsampling.
Note: execute_caggs_compression_policy runs every X interval and compresses only the inactive chunks of those Caggs which have timescaledb.compress = true.
```
procedure void **_prom_catalog.execute_caggs_compression_policy**(IN job_id integer, IN config jsonb)
```
### _prom_catalog.execute_caggs_refresh_policy
execute_caggs_refresh_policy runs every refresh_interval passed in config. Its
main aim is to refresh those Caggs that have been registered under _prom_catalog.metric and whose view_refresh_interval
matches the given refresh_interval. It refreshes 2 kinds of Caggs:
1. Caggs created by metric downsampling
2. Custom Caggs created by the user
```
procedure void **_prom_catalog.execute_caggs_refresh_policy**(IN job_id integer, IN config jsonb)
```
### _prom_catalog.execute_caggs_retention_policy
execute_caggs_retention_policy is responsible to perform retention behaviour on continuous aggregates registered via
register_metric_view(). It loops through all entries in the _prom_catalog.metric that are Caggs and tries to delete the stale chunks of those Caggs.
The staleness is determined by _prom_catalog.downsample.retention (for metric downsampling) and default_retention_period of parent hypertable (for custom Caggs).
These include automatic-downsampling for metrics and custom Caggs based downsampling.
```
procedure void **_prom_catalog.execute_caggs_retention_policy**(IN job_id integer, IN config jsonb)
```
### _prom_catalog.execute_compression_policy
compress data according to the policy. This procedure should be run regularly in a cron job
```
Expand Down Expand Up @@ -847,6 +943,11 @@ function bigint **_prom_catalog.insert_metric_metadatas**(t timestamp with time
```
function bigint **_prom_catalog.insert_metric_row**(metric_table text, time_array timestamp with time zone[], value_array double precision[], series_id_array bigint[])
```
### _prom_catalog.irate

```
function double precision **_prom_catalog.irate**(v double precision[])
```
### _prom_catalog.is_multinode

```
Expand Down Expand Up @@ -995,6 +1096,11 @@ function void **_prom_catalog.resurrect_series_ids**(metric_table text, series_i
```
function bigint **_prom_catalog.safe_approximate_row_count**(table_name_input regclass)
```
### _prom_catalog.scan_for_new_downsampling_views

```
procedure void **_prom_catalog.scan_for_new_downsampling_views**(IN job_id integer, IN config jsonb)
```
### _prom_catalog.set_app_name

```
Expand Down
63 changes: 46 additions & 17 deletions migration/idempotent/001-base.sql
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,9 @@ FROM
('trace_retention_period' , (30 * INTERVAL '1 days')::text),
('ha_lease_timeout' , '1m'),
('ha_lease_refresh' , '10s'),
('epoch_duration' , (INTERVAL '12 hours')::text)
('epoch_duration' , (INTERVAL '12 hours')::text),
('downsample' , 'false'),
('downsample_old_data' , 'false') -- For beta release, we do not plan on refreshing old metric data.
) d(key, value)
;
GRANT SELECT ON _prom_catalog.initial_default TO prom_reader;
Expand Down Expand Up @@ -2562,7 +2564,7 @@ LANGUAGE PLPGSQL;
REVOKE ALL ON FUNCTION _prom_catalog.create_metric_view(text) FROM PUBLIC;
GRANT EXECUTE ON FUNCTION _prom_catalog.create_metric_view(text) TO prom_writer;

CREATE OR REPLACE FUNCTION prom_api.register_metric_view(schema_name text, view_name text, if_not_exists BOOLEAN = false)
CREATE OR REPLACE FUNCTION prom_api.register_metric_view(schema_name text, view_name text, refresh_interval INTERVAL, if_not_exists BOOLEAN = false, downsample_id BIGINT = NULL)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Documentation changes referring to refresh_interval.

RETURNS BOOLEAN
SECURITY DEFINER
VOLATILE
Expand All @@ -2580,29 +2582,49 @@ BEGIN
AND table_name = register_metric_view.view_name;

IF NOT FOUND THEN
RAISE EXCEPTION 'cannot register non-existent metric view in specified schema';
RAISE EXCEPTION 'cannot register non-existent metric view with name % in specified schema %', view_name, schema_name;
END IF;

-- cannot register view in data schema
IF schema_name = 'prom_data' THEN
RAISE EXCEPTION 'cannot register metric view in prom_data schema';
END IF;

-- check if view is based on a metric from prom_data
-- we check for two levels so we can support 2-step continuous aggregates
SELECT v.view_schema, v.view_name, v.metric_table_name
INTO agg_schema, agg_name, metric_table_name
FROM _prom_catalog.get_first_level_view_on_metric(schema_name, view_name) v;
IF downsample_id IS NOT NULL THEN
-- Check if the materialized view is at least created.
PERFORM 1 FROM pg_views WHERE viewname = view_name AND schemaname = schema_name;
IF NOT FOUND THEN
RAISE EXCEPTION 'No materialized view like %.% found', schema_name, view_name;
END IF;

IF NOT FOUND THEN
RAISE EXCEPTION 'view not based on a metric table from prom_data schema';
IF refresh_interval IS NULL THEN
RAISE EXCEPTION 'refresh_interval must not be null for automatic-dowmsampling views';
END IF;

-- We do not do the checks offered by get_first_level_view_on_metric() for automatic metric downsampling
-- because those checks are not meant for Caggs with timescaledb.materialized_only = true.
--
-- Since automatic metric downsampling are an internal creation of Caggs, we should be fine with not doing
-- "strict" safety checks.
metric_table_name := view_name;
agg_name := view_name;
agg_schema := schema_name;
ELSE
-- check if view is based on a metric from prom_data
-- we check for two levels so we can support 2-step continuous aggregates
SELECT v.view_schema, v.view_name, v.metric_table_name
INTO agg_schema, agg_name, metric_table_name
FROM _prom_catalog.get_first_level_view_on_metric(schema_name, view_name) v;

IF NOT FOUND THEN
RAISE EXCEPTION 'view with name % not based on a metric table from prom_data schema', view_name;
END IF;
END IF;

-- check if the view contains necessary columns with the correct types
SELECT count(*) FROM information_schema.columns
INTO column_count
SELECT count(*) INTO column_count FROM information_schema.columns
WHERE table_schema = register_metric_view.schema_name
AND table_name = register_metric_view.view_name
AND table_name = register_metric_view.view_name
AND ((column_name = 'time' AND data_type = 'timestamp with time zone')
OR (column_name = 'series_id' AND data_type = 'bigint')
OR data_type = 'double precision');
Expand All @@ -2611,9 +2633,16 @@ BEGIN
RAISE EXCEPTION 'view must contain time (data type: timestamp with time zone), series_id (data type: bigint), and at least one column with double precision data type';
END IF;

IF refresh_interval IS NULL THEN
-- When a non automatic metric downsampling Cagg is created, we should inform the user that he needs to create a refresh policy himself.
RAISE NOTICE 'Automatic refresh is disabled since refresh_interval is NULL. Please create refresh policy for this Cagg';
ELSE
PERFORM _prom_catalog.create_cagg_refresh_job_if_not_exists(refresh_interval);
END IF;

-- insert into metric table
INSERT INTO _prom_catalog.metric (metric_name, table_name, table_schema, series_table, is_view, creation_completed)
VALUES (register_metric_view.view_name, register_metric_view.view_name, register_metric_view.schema_name, metric_table_name, true, true)
INSERT INTO _prom_catalog.metric (metric_name, table_name, table_schema, series_table, is_view, creation_completed, view_refresh_interval, downsample_id)
VALUES (register_metric_view.view_name, register_metric_view.view_name, register_metric_view.schema_name, metric_table_name, true, true, refresh_interval, register_metric_view.downsample_id)
ON CONFLICT DO NOTHING;

IF NOT FOUND THEN
Expand All @@ -2638,8 +2667,8 @@ END
$func$
LANGUAGE PLPGSQL;
--redundant given schema settings but extra caution for security definers
REVOKE ALL ON FUNCTION prom_api.register_metric_view(text, text, boolean) FROM PUBLIC;
GRANT EXECUTE ON FUNCTION prom_api.register_metric_view(text, text, boolean) TO prom_admin;
REVOKE ALL ON FUNCTION prom_api.register_metric_view(text, text, interval, boolean, bigint) FROM PUBLIC;
GRANT EXECUTE ON FUNCTION prom_api.register_metric_view(text, text, interval, boolean, bigint) TO prom_admin;

CREATE OR REPLACE FUNCTION prom_api.unregister_metric_view(schema_name text, view_name text, if_exists BOOLEAN = false)
RETURNS BOOLEAN
Expand Down
7 changes: 7 additions & 0 deletions migration/idempotent/010-telemetry.sql
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,13 @@ $$
SELECT _prom_catalog.get_default_value('chunk_interval') INTO result;
PERFORM _ps_catalog.apply_telemetry('metrics_default_chunk_interval', result);

-- Metric downsampling.
SELECT prom_api.get_global_downsampling_state()::TEXT INTO result;
PERFORM _ps_catalog.apply_telemetry('metrics_downsampling_enabled', result);

SELECT array_agg(ds_interval || ':' || retention)::TEXT INTO result FROM _prom_catalog.downsample; -- Example: {00:05:00:720:00:00,01:00:00:8760:00:00} => {HH:MM:SS}
PERFORM _ps_catalog.apply_telemetry('metrics_downsampling_configs', result);

IF ( SELECT count(*)>0 FROM _prom_catalog.metric WHERE metric_name = 'prometheus_tsdb_head_series' ) THEN
-- Calculate active series in Promscale. This is done by taking the help of the Prometheus metric 'prometheus_tsdb_head_series'.
-- An active series for Promscale is basically sum of active series of all Prometheus instances writing into Promscale
Expand Down
Loading