Skip to content

Commit d668a87

Browse files
mkindahlerimatnorJamesGuthrie
authored
New hypertable api (#2773)
* Document new hypertable API (#2732) Just make minimal changes to document the new API. Sections on using the new API will be following. Co-authored-by: Erik Nordström <[email protected]> * Fix broken links in distributed hypertable docs (#2767) * Fix links to dimension builders * Update api/add_dimension.md Co-authored-by: James Guthrie <[email protected]> Signed-off-by: Mats Kindahl <[email protected]> --------- Signed-off-by: Mats Kindahl <[email protected]> Co-authored-by: Erik Nordström <[email protected]> Co-authored-by: James Guthrie <[email protected]>
1 parent 5496de3 commit d668a87

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+588
-190
lines changed

Diff for: _partials/_create-hypertable-blockchain.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ with Timescale tables similar to standard PostgreSQL.
3636
the timestamp data to use for partitioning:
3737

3838
```sql
39-
SELECT create_hypertable('transactions', 'time');
39+
SELECT create_hypertable('transactions', by_range('time'));
4040
```
4141

4242
1. Create an index on the `hash` column to make queries for individual

Diff for: _partials/_create-hypertable-energy.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ with Timescale tables similar to standard PostgreSQL.
2727
the timestamp data to use for partitioning:
2828

2929
```sql
30-
SELECT create_hypertable('metrics', 'created');
30+
SELECT create_hypertable('metrics', by_range('created'));
3131
```
3232

3333
</Procedure>

Diff for: _partials/_create-hypertable-nyctaxis.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ with Timescale tables similar to standard PostgreSQL.
4242
the timestamp data to use for partitioning:
4343

4444
```sql
45-
SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE);
45+
SELECT create_hypertable('rides', by_range('pickup_datetime'), create_default_indexes=>FALSE);
46+
SELECT add_dimension('rides', by_hash('payment_type', 2));
4647
```
4748

4849
1. Create an index to support efficient queries by vendor, rate code, and

Diff for: _partials/_create-hypertable-twelvedata-crypto.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ with Timescale tables similar to standard PostgreSQL.
2929
the timestamp data to use for partitioning:
3030

3131
```sql
32-
SELECT create_hypertable('crypto_ticks', 'time');
32+
SELECT create_hypertable('crypto_ticks', by_range('time'));
3333
```
3434

3535
</Procedure>

Diff for: _partials/_create-hypertable-twelvedata-stocks.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ with Timescale tables similar to standard PostgreSQL.
2929
the timestamp data to use for partitioning:
3030

3131
```sql
32-
SELECT create_hypertable('stocks_real_time','time');
32+
SELECT create_hypertable('stocks_real_time', by_range('time'));
3333
```
3434

3535
1. Create an index to support efficient queries on the `symbol` and `time`

Diff for: _troubleshooting/caggs-real-time-previously-materialized-not-shown.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,7 @@ CREATE TABLE conditions(
3737
temperature INT NOT NULL);
3838

3939
SELECT create_hypertable(
40-
'conditions', 'day',
41-
chunk_time_interval => INTERVAL '1 day'
40+
'conditions', by_range('day', INTERVAL '1 day')
4241
);
4342

4443
INSERT INTO conditions (day, city, temperature) VALUES

Diff for: api/add_data_node.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ TimescaleDB extension on the data node unless it is already installed.
102102
### Sample usage
103103

104104
If you have an existing hypertable `conditions` and want to use `time`
105-
as the time partitioning column and `location` as the space partitioning
105+
as the range partitioning column and `location` as the hash partitioning
106106
column. You also want to distribute the chunks of the hypertable on two
107107
data nodes `dn1.example.com` and `dn2.example.com`:
108108

Diff for: api/add_dimension.md

+30-50
Original file line numberDiff line numberDiff line change
@@ -13,38 +13,40 @@ api:
1313

1414
Add an additional partitioning dimension to a Timescale hypertable.
1515
The column selected as the dimension can either use interval
16-
partitioning (for example, for a second time partition) or hash partitioning.
16+
partitioning (for example, for a second range partition) or hash partitioning.
17+
18+
**Note: this reference describes the new generalized hypertable API. The [old interface for `add_dimension` is also available](add_dimension_old.md).**
1719

1820
<Highlight type="warning">
1921
The `add_dimension` command can only be executed after a table has been
2022
converted to a hypertable (via `create_hypertable`), but must similarly
2123
be run only on an empty hypertable.
2224
</Highlight>
2325

24-
**Space partitions**: Using space partitions is highly recommended
26+
**Hash partitions (previously called space partitions)**: Using hash partitions is highly recommended
2527
for [distributed hypertables][distributed-hypertables] to achieve
2628
efficient scale-out performance. For [regular hypertables][regular-hypertables]
2729
that exist only on a single node, additional partitioning can be used
2830
for specialized use cases and not recommended for most users.
2931

30-
Space partitions use hashing: Every distinct item is hashed to one of
31-
*N* buckets. Remember that we are already using (flexible) time
32-
intervals to manage chunk sizes; the main purpose of space
32+
Every distinct item in hash partitioning is hashed to one of
33+
*N* buckets. Remember that we are already using (flexible) range
34+
intervals to manage chunk sizes; the main purpose of hash
3335
partitioning is to enable parallelization across multiple
3436
data nodes (in the case of distributed hypertables) or
3537
across multiple disks within the same time interval
3638
(in the case of single-node deployments).
3739

3840
### Parallelizing queries across multiple data nodes
3941

40-
In a distributed hypertable, space partitioning enables inserts to be
42+
In a distributed hypertable, hash partitioning enables inserts to be
4143
parallelized across data nodes, even while the inserted rows share
4244
timestamps from the same time interval, and thus increases the ingest rate.
4345
Query performance also benefits by being able to parallelize queries
4446
across nodes, particularly when full or partial aggregations can be
4547
"pushed down" to data nodes (for example, as in the query
4648
`avg(temperature) FROM conditions GROUP BY hour, location`
47-
when using `location` as a space partition). Please see our
49+
when using `location` as a hash partition). Please see our
4850
[best practices about partitioning in distributed hypertables][distributed-hypertable-partitioning-best-practices]
4951
for more information.
5052

@@ -71,10 +73,10 @@ disks, single query to multiple disks in parallel). The multiple
7173
tablespace approach only supports the former. With a RAID setup,
7274
*no spatial partitioning is required*.
7375

74-
That said, when using space partitions, we recommend using 1
75-
space partition per disk.
76+
That said, when using hash partitions, we recommend using 1
77+
hash partition per disk.
7678

77-
Timescale does *not* benefit from a very large number of space
79+
Timescale does *not* benefit from a very large number of hash
7880
partitions (such as the number of unique items you expect in partition
7981
field). A very large number of such partitions leads both to poorer
8082
per-partition load balancing (the mapping of items to partitions using
@@ -86,7 +88,7 @@ queries.
8688
|Name|Type|Description|
8789
|-|-|-|
8890
|`hypertable`|REGCLASS|Hypertable to add the dimension to|
89-
|`column_name`|TEXT|Column to partition by|
91+
|`dimension`|DIMENSION_INFO | Dimension to partition by|
9092

9193
### Optional arguments
9294

@@ -102,66 +104,44 @@ queries.
102104
|Column|Type|Description|
103105
|-|-|-|
104106
|`dimension_id`|INTEGER|ID of the dimension in the TimescaleDB internal catalog|
105-
|`schema_name`|TEXT|Schema name of the hypertable|
106-
|`table_name`|TEXT|Table name of the hypertable|
107-
|`column_name`|TEXT|Column name of the column to partition by|
108107
|`created`|BOOLEAN|True if the dimension was added, false when `if_not_exists` is true and no dimension was added|
109108

110-
When executing this function, either `number_partitions` or
111-
`chunk_time_interval` must be supplied, which dictates if the
112-
dimension uses hash or interval partitioning.
113-
114-
The `chunk_time_interval` should be specified as follows:
115-
116-
* If the column to be partitioned is a TIMESTAMP, TIMESTAMPTZ, or
117-
DATE, this length should be specified either as an INTERVAL type or
118-
an integer value in *microseconds*.
119-
120-
* If the column is some other integer type, this length
121-
should be an integer that reflects
122-
the column's underlying semantics (for example, the
123-
`chunk_time_interval` should be given in milliseconds if this column
124-
is the number of milliseconds since the UNIX epoch).
125-
126-
<Highlight type="warning">
127-
Supporting more than **one** additional dimension is currently
128-
experimental. For any production environments, users are recommended
129-
to use at most one "space" dimension.
130-
131-
</Highlight>
132-
133109
### Sample use
134110

135-
First convert table `conditions` to hypertable with just time
136-
partitioning on column `time`, then add an additional partition key on `location` with four partitions:
111+
First convert table `conditions` to hypertable with just range
112+
partitioning on column `time`, then add an additional partition key on
113+
`location` with four partitions:
137114

138115
```sql
139-
SELECT create_hypertable('conditions', 'time');
140-
SELECT add_dimension('conditions', 'location', number_partitions => 4);
116+
SELECT create_hypertable('conditions', by_range('time'));
117+
SELECT add_dimension('conditions', by_hash('location', 4));
141118
```
142119

143-
Convert table `conditions` to hypertable with time partitioning on `time` and
144-
space partitioning (2 partitions) on `location`, then add two additional dimensions.
120+
Convert table `conditions` to hypertable with range partitioning on
121+
`time` then add three additional dimensions: one hash partitioning on
122+
`location`, one range partition on `time_received`, and one hash
123+
partitionining on `device_id`.
145124

146125
```sql
147-
SELECT create_hypertable('conditions', 'time', 'location', 2);
148-
SELECT add_dimension('conditions', 'time_received', chunk_time_interval => INTERVAL '1 day');
149-
SELECT add_dimension('conditions', 'device_id', number_partitions => 2);
150-
SELECT add_dimension('conditions', 'device_id', number_partitions => 2, if_not_exists => true);
126+
SELECT create_hypertable('conditions', by_range('time'));
127+
SELECT add_dimension('conditions', , by_hash('location', 2));
128+
SELECT add_dimension('conditions', by_range('time_received', INTERVAL '1 day'));
129+
SELECT add_dimension('conditions', by_hash('device_id', 2));
130+
SELECT add_dimension('conditions', by_hash('device_id', 2), if_not_exists => true);
151131
```
152132

153133
Now in a multi-node example for distributed hypertables with a cluster
154134
of one access node and two data nodes, configure the access node for
155135
access to the two data nodes. Then, convert table `conditions` to
156-
a distributed hypertable with just time partitioning on column `time`,
157-
and finally add a space partitioning dimension on `location`
136+
a distributed hypertable with just range partitioning on column `time`,
137+
and finally add a hash partitioning dimension on `location`
158138
with two partitions (as the number of the attached data nodes).
159139

160140
```sql
161141
SELECT add_data_node('dn1', host => 'dn1.example.com');
162142
SELECT add_data_node('dn2', host => 'dn2.example.com');
163143
SELECT create_distributed_hypertable('conditions', 'time');
164-
SELECT add_dimension('conditions', 'location', number_partitions => 2);
144+
SELECT add_dimension('conditions', by_hash('location', 2));
165145
```
166146

167147
[create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable/

Diff for: api/add_dimension_old.md

+170
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
---
2+
api_name: add_dimension()
3+
excerpt: Add a space-partitioning dimension to a hypertable
4+
topics: [hypertables]
5+
keywords: [hypertables, partitions]
6+
tags: [dimensions, chunks]
7+
api:
8+
license: apache
9+
type: function
10+
---
11+
12+
# add_dimension()
13+
14+
Add an additional partitioning dimension to a Timescale hypertable.
15+
The column selected as the dimension can either use interval
16+
partitioning (for example, for a second time partition) or hash partitioning.
17+
18+
<Highlight type="warning">
19+
The `add_dimension` command can only be executed after a table has been
20+
converted to a hypertable (via `create_hypertable`), but must similarly
21+
be run only on an empty hypertable.
22+
</Highlight>
23+
24+
**Space partitions**: Using space partitions is highly recommended
25+
for [distributed hypertables][distributed-hypertables] to achieve
26+
efficient scale-out performance. For [regular hypertables][regular-hypertables]
27+
that exist only on a single node, additional partitioning can be used
28+
for specialized use cases and not recommended for most users.
29+
30+
Space partitions use hashing: Every distinct item is hashed to one of
31+
*N* buckets. Remember that we are already using (flexible) time
32+
intervals to manage chunk sizes; the main purpose of space
33+
partitioning is to enable parallelization across multiple
34+
data nodes (in the case of distributed hypertables) or
35+
across multiple disks within the same time interval
36+
(in the case of single-node deployments).
37+
38+
### Parallelizing queries across multiple data nodes
39+
40+
In a distributed hypertable, space partitioning enables inserts to be
41+
parallelized across data nodes, even while the inserted rows share
42+
timestamps from the same time interval, and thus increases the ingest rate.
43+
Query performance also benefits by being able to parallelize queries
44+
across nodes, particularly when full or partial aggregations can be
45+
"pushed down" to data nodes (for example, as in the query
46+
`avg(temperature) FROM conditions GROUP BY hour, location`
47+
when using `location` as a space partition). Please see our
48+
[best practices about partitioning in distributed hypertables][distributed-hypertable-partitioning-best-practices]
49+
for more information.
50+
51+
### Parallelizing disk I/O on a single node
52+
53+
Parallel I/O can benefit in two scenarios: (a) two or more concurrent
54+
queries should be able to read from different disks in parallel, or
55+
(b) a single query should be able to use query parallelization to read
56+
from multiple disks in parallel.
57+
58+
Thus, users looking for parallel I/O have two options:
59+
60+
1. Use a RAID setup across multiple physical disks, and expose a
61+
single logical disk to the hypertable (that is, via a single tablespace).
62+
63+
1. For each physical disk, add a separate tablespace to the
64+
database. Timescale allows you to actually add multiple tablespaces
65+
to a *single* hypertable (although under the covers, a hypertable's
66+
chunks are spread across the tablespaces associated with that hypertable).
67+
68+
We recommend a RAID setup when possible, as it supports both forms of
69+
parallelization described above (that is, separate queries to separate
70+
disks, single query to multiple disks in parallel). The multiple
71+
tablespace approach only supports the former. With a RAID setup,
72+
*no spatial partitioning is required*.
73+
74+
That said, when using space partitions, we recommend using 1
75+
space partition per disk.
76+
77+
Timescale does *not* benefit from a very large number of space
78+
partitions (such as the number of unique items you expect in partition
79+
field). A very large number of such partitions leads both to poorer
80+
per-partition load balancing (the mapping of items to partitions using
81+
hashing), as well as much increased planning latency for some types of
82+
queries.
83+
84+
### Required arguments
85+
86+
|Name|Type|Description|
87+
|-|-|-|
88+
|`hypertable`|REGCLASS|Hypertable to add the dimension to|
89+
|`column_name`|TEXT|Column to partition by|
90+
91+
### Optional arguments
92+
93+
|Name|Type|Description|
94+
|-|-|-|
95+
|`number_partitions`|INTEGER|Number of hash partitions to use on `column_name`. Must be > 0|
96+
|`chunk_time_interval`|INTERVAL|Interval that each chunk covers. Must be > 0|
97+
|`partitioning_func`|REGCLASS|The function to use for calculating a value's partition (see `create_hypertable` [instructions][create_hypertable])|
98+
|`if_not_exists`|BOOLEAN|Set to true to avoid throwing an error if a dimension for the column already exists. A notice is issued instead. Defaults to false|
99+
100+
### Returns
101+
102+
|Column|Type|Description|
103+
|-|-|-|
104+
|`dimension_id`|INTEGER|ID of the dimension in the TimescaleDB internal catalog|
105+
|`schema_name`|TEXT|Schema name of the hypertable|
106+
|`table_name`|TEXT|Table name of the hypertable|
107+
|`column_name`|TEXT|Column name of the column to partition by|
108+
|`created`|BOOLEAN|True if the dimension was added, false when `if_not_exists` is true and no dimension was added|
109+
110+
When executing this function, either `number_partitions` or
111+
`chunk_time_interval` must be supplied, which dictates if the
112+
dimension uses hash or interval partitioning.
113+
114+
The `chunk_time_interval` should be specified as follows:
115+
116+
* If the column to be partitioned is a TIMESTAMP, TIMESTAMPTZ, or
117+
DATE, this length should be specified either as an INTERVAL type or
118+
an integer value in *microseconds*.
119+
120+
* If the column is some other integer type, this length
121+
should be an integer that reflects
122+
the column's underlying semantics (for example, the
123+
`chunk_time_interval` should be given in milliseconds if this column
124+
is the number of milliseconds since the UNIX epoch).
125+
126+
<Highlight type="warning">
127+
Supporting more than **one** additional dimension is currently
128+
experimental. For any production environments, users are recommended
129+
to use at most one "space" dimension.
130+
131+
</Highlight>
132+
133+
### Sample use
134+
135+
First convert table `conditions` to hypertable with just time
136+
partitioning on column `time`, then add an additional partition key on `location` with four partitions:
137+
138+
```sql
139+
SELECT create_hypertable('conditions', 'time');
140+
SELECT add_dimension('conditions', 'location', number_partitions => 4);
141+
```
142+
143+
Convert table `conditions` to hypertable with time partitioning on `time` and
144+
space partitioning (2 partitions) on `location`, then add two additional dimensions.
145+
146+
```sql
147+
SELECT create_hypertable('conditions', 'time', 'location', 2);
148+
SELECT add_dimension('conditions', 'time_received', chunk_time_interval => INTERVAL '1 day');
149+
SELECT add_dimension('conditions', 'device_id', number_partitions => 2);
150+
SELECT add_dimension('conditions', 'device_id', number_partitions => 2, if_not_exists => true);
151+
```
152+
153+
Now in a multi-node example for distributed hypertables with a cluster
154+
of one access node and two data nodes, configure the access node for
155+
access to the two data nodes. Then, convert table `conditions` to
156+
a distributed hypertable with just time partitioning on column `time`,
157+
and finally add a space partitioning dimension on `location`
158+
with two partitions (as the number of the attached data nodes).
159+
160+
```sql
161+
SELECT add_data_node('dn1', host => 'dn1.example.com');
162+
SELECT add_data_node('dn2', host => 'dn2.example.com');
163+
SELECT create_distributed_hypertable('conditions', 'time');
164+
SELECT add_dimension('conditions', 'location', number_partitions => 2);
165+
```
166+
167+
[create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable_old/
168+
[distributed-hypertable-partitioning-best-practices]: /use-timescale/:currentVersion:/hypertables/about-hypertables/#space-partitioning
169+
[distributed-hypertables]: /api/:currentVersion:/distributed-hypertables/create_distributed_hypertable/
170+
[regular-hypertables]: /api/:currentVersion:/hypertable/create_hypertable/

Diff for: api/chunks.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ Get information about the chunks of a hypertable.
4747
CREATE TABLESPACE tablespace1 location '/usr/local/pgsql/data1';
4848

4949
CREATE TABLE hyper_int (a_col integer, b_col integer, c integer);
50-
SELECT table_name from create_hypertable('hyper_int', 'a_col', chunk_time_interval=> 10);
50+
SELECT table_name from create_hypertable('hyper_int', by_range('a_col', 10));
5151
CREATE OR REPLACE FUNCTION integer_now_hyper_int() returns int LANGUAGE SQL STABLE as $$ SELECT coalesce(max(a_col), 0) FROM hyper_int $$;
5252
SELECT set_integer_now_func('hyper_int', 'integer_now_hyper_int');
5353

0 commit comments

Comments
 (0)