Skip to content

Commit c1f1c4b

Browse files
charislamLoquacity
andauthored
reorg(percentile approximation): migrate to new hyperfunctions template (timescale#1799)
Co-authored-by: Lana Brindley <[email protected]>
1 parent e4a8699 commit c1f1c4b

35 files changed

+986
-913
lines changed

_tutorials/deprecated.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
[]
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
api_name: approx_percentile()
3+
excerpt: Estimate the value at a given percentile from a `tdigest`
4+
topics: [hyperfunctions]
5+
api:
6+
license: community
7+
type: function
8+
toolkit: true
9+
version:
10+
experimental: 0.2.0
11+
stable: 1.0.0
12+
hyperfunction:
13+
family: percentile approximation
14+
type: accessor
15+
aggregates:
16+
- tdigest()
17+
api_details:
18+
summary: Estimate the approximate value at a percentile from a `tdigest` aggregate.
19+
signatures:
20+
- language: sql
21+
code: |
22+
approx_percentile(
23+
percentile DOUBLE PRECISION,
24+
tdigest TDigest
25+
) RETURNS DOUBLE PRECISION
26+
parameters:
27+
required:
28+
- name: percentile
29+
type: DOUBLE PRECISION
30+
description: The percentile to compute. Must be within the range `[0.0, 1.0]`.
31+
- name: tdigest
32+
type: TDigest
33+
description: The `tdigest` aggregate.
34+
returns:
35+
- column: approx_percentile
36+
type: DOUBLE PRECISION
37+
description: The estimated value at the requested percentile.
38+
examples:
39+
- description: >
40+
Estimate the value at the first percentile, given a sample containing the
41+
numbers from 0 to 100.
42+
command:
43+
code: |
44+
SELECT
45+
approx_percentile(0.01, tdigest(data))
46+
FROM generate_series(0, 100) data;
47+
return:
48+
code: |
49+
approx_percentile
50+
-------------------
51+
0.999
52+
---
53+
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
---
2+
api_name: approx_percentile_rank()
3+
excerpt: Estimate the percentile of a given value from a `tdigest`
4+
topics: [hyperfunctions]
5+
api:
6+
license: community
7+
type: function
8+
toolkit: true
9+
version:
10+
experimental: 0.2.0
11+
stable: 1.0.0
12+
hyperfunction:
13+
family: percentile approximation
14+
type: accessor
15+
aggregates:
16+
- tdigest()
17+
api_details:
18+
summary: Estimate the the percentile at which a given value would be located.
19+
signatures:
20+
- language: sql
21+
code: |
22+
approx_percentile_rank(
23+
value DOUBLE PRECISION,
24+
digest TDigest
25+
) RETURNS DOUBLE PRECISION
26+
parameters:
27+
required:
28+
- name: value
29+
type: DOUBLE PRECISION
30+
description: The value to estimate the percentile of.
31+
- name: digest
32+
type: TDigest
33+
description: The `tdigest` aggregate.
34+
returns:
35+
- column: approx_percentile_rank
36+
type: DOUBLE PRECISION
37+
description: The estimated percentile associated with the provided value.
38+
examples:
39+
- description: >
40+
Estimate the percentile rank of the value `99`, given a sample containing
41+
the numbers from 0 to 100.
42+
command:
43+
code: |
44+
SELECT
45+
approx_percentile_rank(99, tdigest(data))
46+
FROM generate_series(0, 100) data;
47+
return:
48+
code: |
49+
approx_percentile_rank
50+
----------------------------
51+
0.9851485148514851
52+
---
53+
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
---
2+
section: hyperfunction
3+
subsection: tdigest()
4+
---
5+
6+
### Aggregate and roll up percentile data to calculate daily percentiles
7+
8+
Create an hourly continuous aggregate that contains a percentile aggregate:
9+
10+
```sql
11+
CREATE MATERIALIZED VIEW foo_hourly
12+
WITH (timescaledb.continuous)
13+
AS SELECT
14+
time_bucket('1 h'::interval, ts) as bucket,
15+
tdigest(value) as tdigest
16+
FROM foo
17+
GROUP BY 1;
18+
```
19+
20+
You can use accessors to query directly from the continuous aggregate for
21+
hourly data. You can also roll the hourly data up into daily buckets, then
22+
calculate approximate percentiles:
23+
24+
```sql
25+
SELECT
26+
time_bucket('1 day'::interval, bucket) as bucket,
27+
approx_percentile(0.95, rollup(tdigest)) as p95,
28+
approx_percentile(0.99, rollup(tdigest)) as p99
29+
FROM foo_hourly
30+
GROUP BY 1;
31+
```

api/_hyperfunctions/tdigest/intro.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
---
2+
section: hyperfunction
3+
subsection: tdigest()
4+
---
5+
6+
Estimate the value at a given percentile, or the percentile rank of a given
7+
value, using the t-digest algorithm. This estimation is more memory- and
8+
CPU-efficient than an exact calculation using PostgreSQL's `percentile_cont` and
9+
`percentile_disc` functions.
10+
11+
`tdigest` is one of two advanced percentile approximation aggregates provided in
12+
TimescaleDB Toolkit. It is a space-efficient aggregation, and it provides more
13+
accurate estimates at extreme quantiles than traditional methods.
14+
15+
`tdigest` is somewhat dependent on input order. If `tdigest` is run on the same
16+
data arranged in different order, the results should be nearly equal, but they
17+
are unlikely to be exact.
18+
19+
The other advanced percentile approximation aggregate is
20+
[`uddsketch`][uddsketch], which produces stable estimates within a guaranteed
21+
relative error. If you aren't sure which to use, try the default percentile
22+
estimation method, [`percentile_agg`][percentile_agg]. It uses the `uddsketch`
23+
algorithm with some sensible defaults.
24+
25+
For more information about percentile approximation algorithms, see the
26+
[algorithms overview][algorithms].
27+
28+
[algorithms]: /timescaledb/:currentVersion:/how-to-guides/hyperfunctions/percentile-approx/advanced-agg/
29+
[percentile_agg]: /api/:currentVersion:/hyperfunctions/percentile-approximation/uddsketch/#percentile_agg
30+
[uddsketch]: /api/:currentVersion:/hyperfunctions/percentile-approximation/uddsketch/

api/_hyperfunctions/tdigest/max_val

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
api_name: max_val()
3+
excerpt: Get the maximum value from a `tdigest`
4+
topics: [hyperfunctions]
5+
api:
6+
license: community
7+
type: function
8+
toolkit: true
9+
version:
10+
experimental: 0.1.0
11+
stable: 1.0.0
12+
hyperfunction:
13+
family: percentile approximation
14+
type: accessor
15+
aggregates:
16+
- tdigest()
17+
api_details:
18+
summary: >
19+
Get the maximum value from a `tdigest`. This accessor allows you to
20+
calculate the maximum alongside percentiles, without needing to create
21+
two separate aggregates from the same raw data.
22+
signatures:
23+
- code: |
24+
max_val(
25+
digest TDigest
26+
) RETURNS DOUBLE PRECISION
27+
parameters:
28+
required:
29+
- name: digest
30+
type: TDigest
31+
description: The digest to extract the max value from.
32+
returns:
33+
- column: max_val
34+
type: DOUBLE PRECISION
35+
description: The maximum value entered into the `tdigest`.
36+
examples:
37+
- description: >
38+
Get the maximum of the integers from 1 to 100.
39+
command:
40+
code: |
41+
SELECT max_val(tdigest(100, data))
42+
FROM generate_series(1, 100) data;
43+
return:
44+
code: |
45+
max_val
46+
---------
47+
100
48+
---
49+

api/_hyperfunctions/tdigest/mean.md

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
api_name: mean()
3+
excerpt: Calculate the exact mean from values in a `tdigest`
4+
topics: [hyperfunctions]
5+
api:
6+
license: community
7+
type: function
8+
toolkit: true
9+
version:
10+
experimental: 0.1.0
11+
stable: 1.0.0
12+
hyperfunction:
13+
family: percentile approximation
14+
type: accessor
15+
aggregates:
16+
- tdigest()
17+
api_details:
18+
summary: >
19+
Calculate the exact mean of the values in a `tdigest` aggregate. Unlike
20+
percentile calculations, the mean calculation is exact. This accessor
21+
allows you to calculate the mean alongside percentiles, without needing to
22+
create two separate aggregates from the same raw data.
23+
signatures:
24+
- language: sql
25+
code: |
26+
mean(
27+
digest TDigest
28+
) RETURNS DOUBLE PRECISION
29+
parameters:
30+
required:
31+
- name: digest
32+
type: TDigest
33+
description: The `tdigest` aggregate to extract the mean from.
34+
returns:
35+
- column: mean
36+
type: DOUBLE PRECISION
37+
description: The mean of the values in the `tdigest` aggregate.
38+
examples:
39+
- description: >
40+
Calculate the mean of the integers from 0 to 100.
41+
command:
42+
code: |
43+
SELECT mean(tdigest(data))
44+
FROM generate_series(0, 100) data;
45+
return:
46+
code: |
47+
mean
48+
------
49+
50
50+
---
51+

api/_hyperfunctions/tdigest/min_val

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
api_name: min_val()
3+
excerpt: Get the minimum value from a `tdigest`
4+
topics: [hyperfunctions]
5+
api:
6+
license: community
7+
type: function
8+
toolkit: true
9+
version:
10+
experimental: 0.1.0
11+
stable: 1.0.0
12+
hyperfunction:
13+
family: percentile approximation
14+
type: accessor
15+
aggregates:
16+
- tdigest()
17+
api_details:
18+
summary: >
19+
Get the minimum value from a `tdigest`. This accessor allows you to
20+
calculate the minimum alongside percentiles, without needing to create
21+
two separate aggregates from the same raw data.
22+
signatures:
23+
- code: |
24+
min_val(
25+
digest TDigest
26+
) RETURNS DOUBLE PRECISION
27+
parameters:
28+
required:
29+
- name: digest
30+
type: TDigest
31+
description: The digest to extract the minimum value from.
32+
returns:
33+
- column: max_val
34+
type: DOUBLE PRECISION
35+
description: The minimum value entered into the `tdigest`.
36+
examples:
37+
- description: >
38+
Get the minimum of the integers from 1 to 100.
39+
command:
40+
code: |
41+
SELECT max_val(tdigest(100, data))
42+
FROM generate_series(1, 100) data;
43+
return:
44+
code: |
45+
min_val
46+
---------
47+
1
48+
---
49+
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
api_name: num_vals()
3+
excerpt: Get the number of values contained in a `tdigest`
4+
topics: [hyperfunctions]
5+
api:
6+
license: community
7+
type: function
8+
toolkit: true
9+
version:
10+
experimental: 0.1.0
11+
stable: 1.0.0
12+
hyperfunction:
13+
family: percentile approximation
14+
type: accessor
15+
aggregates:
16+
- tdigest()
17+
api_details:
18+
summary: >
19+
Get the number of values contained in a `tdigest` aggregate. This accessor
20+
allows you to calculate a count alongside percentiles, without needing to
21+
create two separate aggregates from the same raw data.
22+
signatures:
23+
- language: sql
24+
code: |
25+
num_vals(
26+
digest TDigest
27+
) RETURNS DOUBLE PRECISION
28+
parameters:
29+
required:
30+
- name: digest
31+
type: TDigest
32+
description: The `tdigest` aggregate to extract the number of values from.
33+
returns:
34+
- column: num_vals
35+
type: DOUBLE PRECISION
36+
description: The number of values in the `tdigest` aggregate.
37+
examples:
38+
- description: >
39+
Count the number of integers from 0 to 100.
40+
command:
41+
code: |
42+
SELECT num_vals(tdigest(data))
43+
FROM generate_series(0, 100) data;
44+
return:
45+
code: |
46+
num_vals
47+
-----------
48+
101
49+
---
50+

0 commit comments

Comments
 (0)