Skip to content

Commit

Permalink
Document new hypertable API (#2732)
Browse files Browse the repository at this point in the history
Just make minimal changes to document the new API. Sections on using
the new API will be following.

Co-authored-by: Erik Nordström <[email protected]>
  • Loading branch information
mkindahl and erimatnor authored Oct 27, 2023
1 parent 8c3bb92 commit 5726611
Show file tree
Hide file tree
Showing 44 changed files with 578 additions and 187 deletions.
2 changes: 1 addition & 1 deletion _partials/_create-hypertable-blockchain.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ with Timescale tables similar to standard PostgreSQL.
the timestamp data to use for partitioning:

```sql
SELECT create_hypertable('transactions', 'time');
SELECT create_hypertable('transactions', by_range('time'));
```

1. Create an index on the `hash` column to make queries for individual
Expand Down
2 changes: 1 addition & 1 deletion _partials/_create-hypertable-energy.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ with Timescale tables similar to standard PostgreSQL.
the timestamp data to use for partitioning:

```sql
SELECT create_hypertable('metrics', 'created');
SELECT create_hypertable('metrics', by_range('created'));
```

</Procedure>
Expand Down
3 changes: 2 additions & 1 deletion _partials/_create-hypertable-nyctaxis.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ with Timescale tables similar to standard PostgreSQL.
the timestamp data to use for partitioning:

```sql
SELECT create_hypertable('rides', 'pickup_datetime', 'payment_type', 2, create_default_indexes=>FALSE);
SELECT create_hypertable('rides', by_range('pickup_datetime'), create_default_indexes=>FALSE);
SELECT add_dimension('rides', by_hash('payment_type', 2));
```

1. Create an index to support efficient queries by vendor, rate code, and
Expand Down
2 changes: 1 addition & 1 deletion _partials/_create-hypertable-twelvedata-crypto.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ with Timescale tables similar to standard PostgreSQL.
the timestamp data to use for partitioning:

```sql
SELECT create_hypertable('crypto_ticks', 'time');
SELECT create_hypertable('crypto_ticks', by_range('time'));
```

</Procedure>
Expand Down
2 changes: 1 addition & 1 deletion _partials/_create-hypertable-twelvedata-stocks.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ with Timescale tables similar to standard PostgreSQL.
the timestamp data to use for partitioning:

```sql
SELECT create_hypertable('stocks_real_time','time');
SELECT create_hypertable('stocks_real_time', by_range('time'));
```

1. Create an index to support efficient queries on the `symbol` and `time`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,7 @@ CREATE TABLE conditions(
temperature INT NOT NULL);

SELECT create_hypertable(
'conditions', 'day',
chunk_time_interval => INTERVAL '1 day'
'conditions', by_range('day', INTERVAL '1 day')
);

INSERT INTO conditions (day, city, temperature) VALUES
Expand Down
2 changes: 1 addition & 1 deletion api/add_data_node.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ TimescaleDB extension on the data node unless it is already installed.
### Sample usage

If you have an existing hypertable `conditions` and want to use `time`
as the time partitioning column and `location` as the space partitioning
as the range partitioning column and `location` as the hash partitioning
column. You also want to distribute the chunks of the hypertable on two
data nodes `dn1.example.com` and `dn2.example.com`:

Expand Down
80 changes: 30 additions & 50 deletions api/add_dimension.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,38 +13,40 @@ api:

Add an additional partitioning dimension to a Timescale hypertable.
The column selected as the dimension can either use interval
partitioning (for example, for a second time partition) or hash partitioning.
partitioning (for example, for a second range partition) or hash partitioning.

**Note: this reference describes the new generalized hypertable API. The [old interface for `add_dimension` is also available](add_dimension_old.md).**

<Highlight type="warning">
The `add_dimension` command can only be executed after a table has been
converted to a hypertable (via `create_hypertable`), but must similarly
be run only on an empty hypertable.
</Highlight>

**Space partitions**: Using space partitions is highly recommended
**Hash partitions (previosly called space partitions)**: Using hash partitions is highly recommended
for [distributed hypertables][distributed-hypertables] to achieve
efficient scale-out performance. For [regular hypertables][regular-hypertables]
that exist only on a single node, additional partitioning can be used
for specialized use cases and not recommended for most users.

Space partitions use hashing: Every distinct item is hashed to one of
*N* buckets. Remember that we are already using (flexible) time
intervals to manage chunk sizes; the main purpose of space
Every distinct item in hash partitioning is hashed to one of
*N* buckets. Remember that we are already using (flexible) range
intervals to manage chunk sizes; the main purpose of hash
partitioning is to enable parallelization across multiple
data nodes (in the case of distributed hypertables) or
across multiple disks within the same time interval
(in the case of single-node deployments).

### Parallelizing queries across multiple data nodes

In a distributed hypertable, space partitioning enables inserts to be
In a distributed hypertable, hash partitioning enables inserts to be
parallelized across data nodes, even while the inserted rows share
timestamps from the same time interval, and thus increases the ingest rate.
Query performance also benefits by being able to parallelize queries
across nodes, particularly when full or partial aggregations can be
"pushed down" to data nodes (for example, as in the query
`avg(temperature) FROM conditions GROUP BY hour, location`
when using `location` as a space partition). Please see our
when using `location` as a hash partition). Please see our
[best practices about partitioning in distributed hypertables][distributed-hypertable-partitioning-best-practices]
for more information.

Expand All @@ -71,10 +73,10 @@ disks, single query to multiple disks in parallel). The multiple
tablespace approach only supports the former. With a RAID setup,
*no spatial partitioning is required*.

That said, when using space partitions, we recommend using 1
space partition per disk.
That said, when using hash partitions, we recommend using 1
hash partition per disk.

Timescale does *not* benefit from a very large number of space
Timescale does *not* benefit from a very large number of hash
partitions (such as the number of unique items you expect in partition
field). A very large number of such partitions leads both to poorer
per-partition load balancing (the mapping of items to partitions using
Expand All @@ -86,7 +88,7 @@ queries.
|Name|Type|Description|
|-|-|-|
|`hypertable`|REGCLASS|Hypertable to add the dimension to|
|`column_name`|TEXT|Column to partition by|
|`dimension`|DIMENSION_INFO | Dimension to partition by|

### Optional arguments

Expand All @@ -102,66 +104,44 @@ queries.
|Column|Type|Description|
|-|-|-|
|`dimension_id`|INTEGER|ID of the dimension in the TimescaleDB internal catalog|
|`schema_name`|TEXT|Schema name of the hypertable|
|`table_name`|TEXT|Table name of the hypertable|
|`column_name`|TEXT|Column name of the column to partition by|
|`created`|BOOLEAN|True if the dimension was added, false when `if_not_exists` is true and no dimension was added|

When executing this function, either `number_partitions` or
`chunk_time_interval` must be supplied, which dictates if the
dimension uses hash or interval partitioning.

The `chunk_time_interval` should be specified as follows:

* If the column to be partitioned is a TIMESTAMP, TIMESTAMPTZ, or
DATE, this length should be specified either as an INTERVAL type or
an integer value in *microseconds*.

* If the column is some other integer type, this length
should be an integer that reflects
the column's underlying semantics (for example, the
`chunk_time_interval` should be given in milliseconds if this column
is the number of milliseconds since the UNIX epoch).

<Highlight type="warning">
Supporting more than **one** additional dimension is currently
experimental. For any production environments, users are recommended
to use at most one "space" dimension.

</Highlight>

### Sample use

First convert table `conditions` to hypertable with just time
partitioning on column `time`, then add an additional partition key on `location` with four partitions:
First convert table `conditions` to hypertable with just range
partitioning on column `time`, then add an additional partition key on
`location` with four partitions:

```sql
SELECT create_hypertable('conditions', 'time');
SELECT add_dimension('conditions', 'location', number_partitions => 4);
SELECT create_hypertable('conditions', by_range('time'));
SELECT add_dimension('conditions', by_hash('location', 4));
```

Convert table `conditions` to hypertable with time partitioning on `time` and
space partitioning (2 partitions) on `location`, then add two additional dimensions.
Convert table `conditions` to hypertable with range partitioning on
`time` then add three additional dimensions: one hash partitioning on
`location`, one range partition on `time_received`, and one hash
partitionining on `device_id`.

```sql
SELECT create_hypertable('conditions', 'time', 'location', 2);
SELECT add_dimension('conditions', 'time_received', chunk_time_interval => INTERVAL '1 day');
SELECT add_dimension('conditions', 'device_id', number_partitions => 2);
SELECT add_dimension('conditions', 'device_id', number_partitions => 2, if_not_exists => true);
SELECT create_hypertable('conditions', by_range('time'));
SELECT add_dimension('conditions', , by_hash('location', 2));
SELECT add_dimension('conditions', by_range('time_received', INTERVAL '1 day'));
SELECT add_dimension('conditions', by_hash('device_id', 2));
SELECT add_dimension('conditions', by_hash('device_id', 2), if_not_exists => true);
```

Now in a multi-node example for distributed hypertables with a cluster
of one access node and two data nodes, configure the access node for
access to the two data nodes. Then, convert table `conditions` to
a distributed hypertable with just time partitioning on column `time`,
and finally add a space partitioning dimension on `location`
a distributed hypertable with just range partitioning on column `time`,
and finally add a hash partitioning dimension on `location`
with two partitions (as the number of the attached data nodes).

```sql
SELECT add_data_node('dn1', host => 'dn1.example.com');
SELECT add_data_node('dn2', host => 'dn2.example.com');
SELECT create_distributed_hypertable('conditions', 'time');
SELECT add_dimension('conditions', 'location', number_partitions => 2);
SELECT add_dimension('conditions', by_hash('location', 2));
```

[create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable/
Expand Down
170 changes: 170 additions & 0 deletions api/add_dimension_old.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
api_name: add_dimension()
excerpt: Add a space-partitioning dimension to a hypertable
topics: [hypertables]
keywords: [hypertables, partitions]
tags: [dimensions, chunks]
api:
license: apache
type: function
---

# add_dimension()

Add an additional partitioning dimension to a Timescale hypertable.
The column selected as the dimension can either use interval
partitioning (for example, for a second time partition) or hash partitioning.

<Highlight type="warning">
The `add_dimension` command can only be executed after a table has been
converted to a hypertable (via `create_hypertable`), but must similarly
be run only on an empty hypertable.
</Highlight>

**Space partitions**: Using space partitions is highly recommended
for [distributed hypertables][distributed-hypertables] to achieve
efficient scale-out performance. For [regular hypertables][regular-hypertables]
that exist only on a single node, additional partitioning can be used
for specialized use cases and not recommended for most users.

Space partitions use hashing: Every distinct item is hashed to one of
*N* buckets. Remember that we are already using (flexible) time
intervals to manage chunk sizes; the main purpose of space
partitioning is to enable parallelization across multiple
data nodes (in the case of distributed hypertables) or
across multiple disks within the same time interval
(in the case of single-node deployments).

### Parallelizing queries across multiple data nodes

In a distributed hypertable, space partitioning enables inserts to be
parallelized across data nodes, even while the inserted rows share
timestamps from the same time interval, and thus increases the ingest rate.
Query performance also benefits by being able to parallelize queries
across nodes, particularly when full or partial aggregations can be
"pushed down" to data nodes (for example, as in the query
`avg(temperature) FROM conditions GROUP BY hour, location`
when using `location` as a space partition). Please see our
[best practices about partitioning in distributed hypertables][distributed-hypertable-partitioning-best-practices]
for more information.

### Parallelizing disk I/O on a single node

Parallel I/O can benefit in two scenarios: (a) two or more concurrent
queries should be able to read from different disks in parallel, or
(b) a single query should be able to use query parallelization to read
from multiple disks in parallel.

Thus, users looking for parallel I/O have two options:

1. Use a RAID setup across multiple physical disks, and expose a
single logical disk to the hypertable (that is, via a single tablespace).

1. For each physical disk, add a separate tablespace to the
database. Timescale allows you to actually add multiple tablespaces
to a *single* hypertable (although under the covers, a hypertable's
chunks are spread across the tablespaces associated with that hypertable).

We recommend a RAID setup when possible, as it supports both forms of
parallelization described above (that is, separate queries to separate
disks, single query to multiple disks in parallel). The multiple
tablespace approach only supports the former. With a RAID setup,
*no spatial partitioning is required*.

That said, when using space partitions, we recommend using 1
space partition per disk.

Timescale does *not* benefit from a very large number of space
partitions (such as the number of unique items you expect in partition
field). A very large number of such partitions leads both to poorer
per-partition load balancing (the mapping of items to partitions using
hashing), as well as much increased planning latency for some types of
queries.

### Required arguments

|Name|Type|Description|
|-|-|-|
|`hypertable`|REGCLASS|Hypertable to add the dimension to|
|`column_name`|TEXT|Column to partition by|

### Optional arguments

|Name|Type|Description|
|-|-|-|
|`number_partitions`|INTEGER|Number of hash partitions to use on `column_name`. Must be > 0|
|`chunk_time_interval`|INTERVAL|Interval that each chunk covers. Must be > 0|
|`partitioning_func`|REGCLASS|The function to use for calculating a value's partition (see `create_hypertable` [instructions][create_hypertable])|
|`if_not_exists`|BOOLEAN|Set to true to avoid throwing an error if a dimension for the column already exists. A notice is issued instead. Defaults to false|

### Returns

|Column|Type|Description|
|-|-|-|
|`dimension_id`|INTEGER|ID of the dimension in the TimescaleDB internal catalog|
|`schema_name`|TEXT|Schema name of the hypertable|
|`table_name`|TEXT|Table name of the hypertable|
|`column_name`|TEXT|Column name of the column to partition by|
|`created`|BOOLEAN|True if the dimension was added, false when `if_not_exists` is true and no dimension was added|

When executing this function, either `number_partitions` or
`chunk_time_interval` must be supplied, which dictates if the
dimension uses hash or interval partitioning.

The `chunk_time_interval` should be specified as follows:

* If the column to be partitioned is a TIMESTAMP, TIMESTAMPTZ, or
DATE, this length should be specified either as an INTERVAL type or
an integer value in *microseconds*.

* If the column is some other integer type, this length
should be an integer that reflects
the column's underlying semantics (for example, the
`chunk_time_interval` should be given in milliseconds if this column
is the number of milliseconds since the UNIX epoch).

<Highlight type="warning">
Supporting more than **one** additional dimension is currently
experimental. For any production environments, users are recommended
to use at most one "space" dimension.

</Highlight>

### Sample use

First convert table `conditions` to hypertable with just time
partitioning on column `time`, then add an additional partition key on `location` with four partitions:

```sql
SELECT create_hypertable('conditions', 'time');
SELECT add_dimension('conditions', 'location', number_partitions => 4);
```

Convert table `conditions` to hypertable with time partitioning on `time` and
space partitioning (2 partitions) on `location`, then add two additional dimensions.

```sql
SELECT create_hypertable('conditions', 'time', 'location', 2);
SELECT add_dimension('conditions', 'time_received', chunk_time_interval => INTERVAL '1 day');
SELECT add_dimension('conditions', 'device_id', number_partitions => 2);
SELECT add_dimension('conditions', 'device_id', number_partitions => 2, if_not_exists => true);
```

Now in a multi-node example for distributed hypertables with a cluster
of one access node and two data nodes, configure the access node for
access to the two data nodes. Then, convert table `conditions` to
a distributed hypertable with just time partitioning on column `time`,
and finally add a space partitioning dimension on `location`
with two partitions (as the number of the attached data nodes).

```sql
SELECT add_data_node('dn1', host => 'dn1.example.com');
SELECT add_data_node('dn2', host => 'dn2.example.com');
SELECT create_distributed_hypertable('conditions', 'time');
SELECT add_dimension('conditions', 'location', number_partitions => 2);
```

[create_hypertable]: /api/:currentVersion:/hypertable/create_hypertable_old/
[distributed-hypertable-partitioning-best-practices]: /use-timescale/:currentVersion:/hypertables/about-hypertables/#space-partitioning
[distributed-hypertables]: /api/:currentVersion:/distributed-hypertables/create_distributed_hypertable/
[regular-hypertables]: /api/:currentVersion:/hypertable/create_hypertable/
2 changes: 1 addition & 1 deletion api/chunks.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ Get information about the chunks of a hypertable.
CREATE TABLESPACE tablespace1 location '/usr/local/pgsql/data1';

CREATE TABLE hyper_int (a_col integer, b_col integer, c integer);
SELECT table_name from create_hypertable('hyper_int', 'a_col', chunk_time_interval=> 10);
SELECT table_name from create_hypertable('hyper_int', by_range('a_col', 10));
CREATE OR REPLACE FUNCTION integer_now_hyper_int() returns int LANGUAGE SQL STABLE as $$ SELECT coalesce(max(a_col), 0) FROM hyper_int $$;
SELECT set_integer_now_func('hyper_int', 'integer_now_hyper_int');

Expand Down
Loading

0 comments on commit 5726611

Please sign in to comment.