Skip to content

Commit

Permalink
chore: add a note about timescale.enable_chunk_skipping. (#3638)
Browse files Browse the repository at this point in the history
* chore: add a note about timescale.enable_chunk_skipping.

* chore: add a note about recompression after migration from an older version of TimescaleDB.

Co-authored-by: Jônatas Davi Paganini <[email protected]>
  • Loading branch information
billy-the-fish and jonatas authored Dec 6, 2024
1 parent 20aff07 commit f15f314
Showing 1 changed file with 36 additions and 43 deletions.
79 changes: 36 additions & 43 deletions api/enable_chunk_skipping.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,34 +13,36 @@ api:

Enable range statistics for a specific column in a **compressed** hypertable. This tracks a range of values for that column per chunk. Used for chunk pruning during query optimization.

### Required arguments
Best practice is to enable range tracking on columns that are correlated to the
partitioning column. In other words, enable tracking on secondary columns which are
referenced in the `WHERE` clauses in your queries.

|Name|Type|Description|
|-|-|-|
|`hypertable`|REGCLASS|Hypertable that the column belongs to|
|`column_name`|TEXT|Column to track range statistics for|

### Optional arguments
TimescaleDB supports min/max range tracking for the `smallint`, `int`,
`bigint`, `serial`, `bigserial`, `date`, `timestamp`, and `timestamptz` data types. The
min/max ranges are calculated when a chunk belonging to
this hypertable is compressed using the [compress_chunk][compress_chunk] function.
The range is stored in start (inclusive) and end (exclusive) form in the
`chunk_column_stats` catalog table.

|Name|Type|Description|
|-|-|-|
|`if_not_exists`|BOOLEAN|Set to `true` so that a notice is sent when ranges are not being tracked for a column. By default, an error is thrown|
This way you store the min/max values for such columns in this catalog
table at the per-chunk level. These min/max range values do
not participate in partitioning of the data. These ranges are
used for chunk pruning when the `WHERE` clause of an SQL query specifies
ranges on the column.

### Returns
A [DROP COLUMN](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-DESC-DROP-COLUMN)
on a column with statistics tracking enabled on it ends up removing all relevant entries
from the catalog table.

|Column|Type|Description|
|-|-|-|
|`column_stats_id`|INTEGER|ID of the entry in the TimescaleDB internal catalog|
|`enabled`|BOOLEAN|Returns `true` when tracking is enabled, `if_not_exists` is `true`, and when a new entry is not
added|
A [decompress_chunk][decompress_chunk] invocation on a compressed chunk resets its entries
from the `chunk_column_stats` catalog table since now it's available for DML and the
min/max range values can change on any further data manipulation in the chunk.

<Highlight type="note">
TimescaleDB supports min/max range tracking for the `smallint`, `int`,
`bigint`, `serial`, `bigserial`, `date`, `timestamp`, and `timestamptz` data types.
By default, this feature is disabled. To enable chunk skipping, set `timescale.enable_chunk_skipping = on` in
`postgresql.conf`. When you upgrade from a database instance that uses compression but does not support chunk
skipping, you need to recompress the previously compressed chunks for chunk skipping to work.

</Highlight>

### Sample use
## Samples

In this sample, you convert the `conditions` table to a hypertable with
partitioning on the `time` column. You then specify and enable additional columns to track ranges for.
Expand All @@ -50,31 +52,22 @@ SELECT create_hypertable('conditions', 'time');
SELECT enable_chunk_skipping('conditions', 'device_id');
```

<Highlight type="note">
Best practice is to enable range tracking on columns that are correlated to the
partitioning column. In other words, enable tracking on secondary columns which are
referenced in the `WHERE` clauses in your queries.

The min/max ranges are calculated when a chunk belonging to
this hypertable is compressed using the [compress_chunk][compress_chunk] function.
The range is stored in start (inclusive) and end (exclusive) form in the
`chunk_column_stats` catalog table.
## Arguments

This way you store the min/max values for such columns in this catalog
table at the per-chunk level. These min/max range values do
not participate in partitioning of the data. These ranges are
used for chunk pruning when the `WHERE` clause of an SQL query specifies
ranges on the column.
| Name | Type | Default | Required | Description |
|-------------|------------------|---------|-|----------------------------------------|
|`column_name`| `TEXT` | - || Column to track range statistics for |
|`hypertable`| `REGCLASS` | - || Hypertable that the column belongs to |
|`if_not_exists`| `BOOLEAN` | `false` || Set to `true` so that a notice is sent when ranges are not being tracked for a column. By default, an error is thrown |

A [DROP COLUMN](https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-DESC-DROP-COLUMN)
on a column with statistics tracking enabled on it ends up removing all relevant entries
from the catalog table.

A [decompress_chunk][decompress_chunk] invocation on a compressed chunk resets its entries
from the `chunk_column_stats` catalog table since now it's available for DML and the
min/max range values can change on any further data manipulation in the chunk.
## Returns

</Highlight>
|Column|Type|Description|
|-|-|-|
|`column_stats_id`|INTEGER|ID of the entry in the TimescaleDB internal catalog|
|`enabled`|BOOLEAN|Returns `true` when tracking is enabled, `if_not_exists` is `true`, and when a new entry is not
added|

[compress_chunk]: /api/:currentVersion:/compression/compress_chunk/
[decompress_chunk]: /api/:currentVersion:/compression/decompress_chunk/

0 comments on commit f15f314

Please sign in to comment.