Skip to content

Commit

Permalink
Merge branch 'latest' into add-guide-on-materialized-ctes
Browse files Browse the repository at this point in the history
  • Loading branch information
atovpeko authored Dec 13, 2024
2 parents 2621119 + 5dd7ed7 commit f66d0f3
Show file tree
Hide file tree
Showing 50 changed files with 1,092 additions and 633 deletions.
67 changes: 67 additions & 0 deletions _partials/_cookbook-hypertables.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@

## Hypertable recipes

This section contains recipes about hypertables.

### Remove duplicates from an existing hypertable

Looking to remove duplicates from an existing hypertable? One method is to run a `PARTITION BY` query to get
`ROW_NUMBER()` and then the `ctid` of rows where `row_number>1`. You then delete these rows. However,
you need to check `tableoid` and `ctid`. This is because `ctid` is not unique and might be duplicated in
different chunks. The following code example took 17 hours to process a table with 40 million rows:

```sql
CREATE OR REPLACE FUNCTION deduplicate_chunks(ht_name TEXT, partition_columns TEXT, bot_id INT DEFAULT NULL)
RETURNS TABLE
(
chunk_schema name,
chunk_name name,
deleted_count INT
)
AS
$$
DECLARE
chunk RECORD;
where_clause TEXT := '';
deleted_count INT;
BEGIN
IF bot_id IS NOT NULL THEN
where_clause := FORMAT('WHERE bot_id = %s', bot_id);
END IF;

FOR chunk IN
SELECT c.chunk_schema, c.chunk_name
FROM timescaledb_information.chunks c
WHERE c.hypertable_name = ht_name
LOOP
EXECUTE FORMAT('
WITH cte AS (
SELECT ctid,
ROW_NUMBER() OVER (PARTITION BY %s ORDER BY %s ASC) AS row_num,
*
FROM %I.%I
%s
)
DELETE FROM %I.%I
WHERE ctid IN (
SELECT ctid
FROM cte
WHERE row_num > 1
)
RETURNING 1;
', partition_columns, partition_columns, chunk.chunk_schema, chunk.chunk_name, where_clause, chunk.chunk_schema,
chunk.chunk_name)
INTO deleted_count;

RETURN QUERY SELECT chunk.chunk_schema, chunk.chunk_name, COALESCE(deleted_count, 0);
END LOOP;
END
$$ LANGUAGE plpgsql;


SELECT *
FROM deduplicate_chunks('nudge_events', 'bot_id, session_id, nudge_id, time', 2540);
```

Shoutout to **Mathias Ose** and **Christopher Piggott** for this recipe.

131 changes: 131 additions & 0 deletions _partials/_cookbook-iot.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
## IoT recipes

This section contains recipes for IoT issues:

### Work with columnar IoT data

Narrow and medium width tables are a great way to store IoT data. A lot of reasons are outlined in
[Designing Your Database Schema: Wide vs. Narrow Postgres Tables][blog-wide-vs-narrow].

One of the key advantages of narrow tables is that the schema does not have to change when you add new
sensors. Another big advantage is that each sensor can sample at different rates and times. This helps
support things like hysteresis, where new values are written infrequently unless the value changes by a
certain amount.

#### Narrow table format example

Working with narrow table data structures presents a few challenges. In the IoT world one concern is that
many data analysis approaches - including machine learning as well as more traditional data analysis -
require that your data is resampled and synchronized to a common time basis. Fortunately, TimescaleDB provides
you with [hyperfunctions][hyperfunctions] and other tools to help you work with this data.

An example of a narrow table format is:

| ts | sensor_id | value |
|-------------------------|-----------|-------|
| 2024-10-31 11:17:30.000 | 1007 | 23.45 |

Typically you would couple this with a sensor table:

| sensor_id | sensor_name | units |
|-----------|--------------|--------------------------|
| 1007 | temperature | degreesC |
| 1012 | heat_mode | on/off |
| 1013 | cooling_mode | on/off |
| 1041 | occupancy | number of people in room |

A medium table retains the generic structure but adds columns of various types so that you can
use the same table to store float, int, bool, or even JSON (jsonb) data:

| ts | sensor_id | d | i | b | t | j |
|-------------------------|-----------|-------|------|------|------|------|
| 2024-10-31 11:17:30.000 | 1007 | 23.45 | null | null | null | null |
| 2024-10-31 11:17:47.000 | 1012 | null | null | TRUE | null | null |
| 2024-10-31 11:18:01.000 | 1041 | null | 4 | null | null | null |

To remove all-null entries, use an optional constraint such as:

```sql
CONSTRAINT at_least_one_not_null
CHECK ((d IS NOT NULL) OR (i IS NOT NULL) OR (b IS NOT NULL) OR (j IS NOT NULL) OR (t IS NOT NULL))
```

#### Get the last value of every sensor

There are several ways to get the latest value of every sensor. The following examples use the
structure defined in [Narrow table format example][setup-a-narrow-table-format] as a reference:

- [SELECT DISTINCT ON][select-distinct-on]
- [JOIN LATERAL][join-lateral]

##### SELECT DISTINCT ON

If you have a list of sensors, the easy way to get the latest value of every sensor is to use
`SELECT DISTINCT ON`:

```sql
WITH latest_data AS (
SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
FROM iot_data
WHERE d is not null
AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important
ORDER BY sensor_id, ts DESC
)
SELECT
sensor_id, sensors.name, ts, d
FROM latest_data
LEFT OUTER JOIN sensors ON latest_data.sensor_id = sensors.id
WHERE latest_data.d is not null
ORDER BY sensor_id, ts; -- Optional, for displaying results ordered by sensor_id
```

The common table expression (CTE) used above is not strictly necessary. However, it is an elegant way to join
to the sensor list to get a sensor name in the output. If this is not something you care about,
you can leave it out:

```sql
SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
FROM iot_data
WHERE d is not null
AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important
ORDER BY sensor_id, ts DESC
```

It is important to take care when down-selecting this data. In the previous examples,
the time that the query would scan back was limited. However, if there any sensors that have either
not reported in a long time or in the worst case, never reported, this query devolves to a full table scan.
In a database with 1000+ sensors and 41 million rows, an unconstrained query takes over an hour.

#### JOIN LATERAL

An alternative to [SELECT DISTINCT ON][select-distinct-on] is to use a `JOIN LATERAL`. By selecting your entire
sensor list from the sensors table rather than pulling the IDs out using `SELECT DISTINCT`, `JOIN LATERAL` can offer
some improvements in performance:

```sql
SELECT sensor_list.id, latest_data.ts, latest_data.d
FROM sensors sensor_list
-- Add a WHERE clause here to downselect the sensor list, if you wish
LEFT JOIN LATERAL (
SELECT ts, d
FROM iot_data raw_data
WHERE sensor_id = sensor_list.id
ORDER BY ts DESC
LIMIT 1
) latest_data ON true
WHERE latest_data.d is not null -- only pulling out float values ("d" column) in this example
AND latest_data.ts > CURRENT_TIMESTAMP - interval '1 week' -- important
ORDER BY sensor_list.id, latest_data.ts;
```

Limiting the time range is important, especially if you have a lot of data. Best practice is to use these
kinds of queries for dashboards and quick status checks. To query over a much larger time range, encapsulate
the previous example into a materialized query that refreshes infrequently, perhaps once a day.

Shoutout to **Christopher Piggott** for this recipe.

[blog-wide-vs-narrow]: https://www.timescale.com/learn/designing-your-database-schema-wide-vs-narrow-postgres-tables
[setup-a-narrow-table-format]: /tutorials/:currentVersion:/cookbook/#narrow-table-format-example
[select-distinct-on]: /tutorials/:currentVersion:/cookbook/#select-distinct-on
[join-lateral]: /tutorials/:currentVersion:/cookbook/#join-lateral
[hyperfunctions]: /use-timescale/:currentVersion:/hyperfunctions/
2 changes: 1 addition & 1 deletion _partials/_install-self-hosted-docker-based.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ In Terminal:
[TimescaleDB Toolkit](https://github.com/timescale/timescaledb-toolkit), and support for PostGIS and Patroni.
The lighter-weight `timescale/timescaledb:latest-pg17` non-ha image uses [Alpine][alpine].

TimescaleDB is pre-created in the default Postgres database in both the -ha and non-ha docker images.
TimescaleDB is pre-created in the default PostgreSQL database in both the -ha and non-ha docker images.
By default, TimescaleDB is added to any new database you create in these images.

1. **Run the container**
Expand Down
2 changes: 1 addition & 1 deletion _partials/_migrate_dump_postgresql.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
import MigrationSetupFirstSteps from "versionContent/_partials/_migrate_set_up_database_first_steps.mdx";
import MigrationSetupDBConnectionPostgresql from "versionContent/_partials/_migrate_set_up_align_db_extensions_postgres_based.mdx";
import MigrationProcedureDumpSchemaPostgres from "versionContent/_partials/_migrate_dump_roles_schema_data_postgres.mdx";
import MigrationProcedureDumpSchemaPostgreSQL from "versionContent/_partials/_migrate_dump_roles_schema_data_postgres.mdx";
import MigrationValidateRestartApp from "versionContent/_partials/_migrate_validate_and_restart_app.mdx";

## Prepare to migrate
Expand Down
4 changes: 2 additions & 2 deletions _partials/_migrate_install_psql_ec2_instance.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@
```sh
export SOURCE="postgres://<Master username>:<Master password>@<Endpoint>:<Port>/<DB name>"
```
The value of `Master password` was supplied when this Postgres RDS instance was created.
The value of `Master password` was supplied when this PostgreSQL RDS instance was created.
1. Test your connection:
```sh
Expand All @@ -71,4 +71,4 @@
</Procedure>
[about-hypertables]: /use-timescale/:currentVersion:/hypertables/about-hypertables/
[data-compression]: /use-timescale/:currentVersion:/compression/about-compression/
[data-compression]: /use-timescale/:currentVersion:/compression/about-compression/
8 changes: 4 additions & 4 deletions _partials/_migrate_live_migrate_faq_all.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,10 +90,10 @@ This rare phenomenon may happen when:
following GUCs to the recommended values on the source RDS instance.

```shell
psql -X -d $SOURCE -c 'alter system set tcp_keepalives_count=60'
psql -X -d $SOURCE -c 'alter system set tcp_keepalives_idle=10'
psql -X -d $SOURCE -c 'alter system set tcp_keepalives_interval=10'
psql -X -d $SOURCE -c 'alter system set wal_sender_timeout=30m'
psql -X -d $SOURCE -c "alter system set tcp_keepalives_count=60"
psql -X -d $SOURCE -c "alter system set tcp_keepalives_idle=10"
psql -X -d $SOURCE -c "alter system set tcp_keepalives_interval=10"
psql -X -d $SOURCE -c "alter system set wal_sender_timeout='30min'"
```

For more information, see [https://github.com/dimitri/pgcopydb/issues/773#issuecomment-2139093365](https://github.com/dimitri/pgcopydb/issues/773#issuecomment-2139093365)
Expand Down
6 changes: 3 additions & 3 deletions _partials/_migrate_live_setup_connection_strings.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ These variables hold the connection information for the source database and targ
In Terminal on your migration machine, set the following:

```bash
export SOURCE=postgres://<user>:<password>@<source host>:<source port>/<db_name>
export TARGET=postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require
export SOURCE="postgres://<user>:<password>@<source host>:<source port>/<db_name>"
export TARGET="postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"
```
You find the connection information for your Timescale Cloud service in the configuration file you
downloaded when you created the service.
downloaded when you created the service.
10 changes: 5 additions & 5 deletions _partials/_migrate_live_tune_source_database.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ a managed service, follow the instructions in the `From MST` tab on this page.

1. **Tune system messaging**
```shell
psql -X -d $SOURCE -c 'alter system set tcp_keepalives_count=60'
psql -X -d $SOURCE -c 'alter system set tcp_keepalives_idle=10'
psql -X -d $SOURCE -c 'alter system set tcp_keepalives_interval=10'
psql -X -d $SOURCE -c 'alter system set wal_sender_timeout=30m'
psql -X -d $SOURCE -c "alter system set tcp_keepalives_count=60"
psql -X -d $SOURCE -c "alter system set tcp_keepalives_idle=10"
psql -X -d $SOURCE -c "alter system set tcp_keepalives_interval=10"
psql -X -d $SOURCE -c "alter system set wal_sender_timeout='30min'"
```

1. **Restart the source database**
Expand All @@ -33,4 +33,4 @@ a managed service, follow the instructions in the `From MST` tab on this page.

<EnableReplication />

[install-wal2json]: https://github.com/eulerto/wal2json
[install-wal2json]: https://github.com/eulerto/wal2json
2 changes: 1 addition & 1 deletion _partials/_migrate_live_tune_source_database_awsrds.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Updating parameters on a PostgreSQL instance will cause an outage. Choose a time
- `tcp_keepalives_count` set to `60`: the number of messages that can be lost before the client is considered dead.
- `tcp_keepalives_idle` set to `10`: the amount of time with no network activity before the IS sends a TCP keepalive message to the client.
- `tcp_keepalives_interval` set to `10`: the amount of time before a unacknowledged TCP keepalive message is restransmitted.
- `wal_sender_timeout` set to `30m`: the maximum time to wait for WAL replication.
- `wal_sender_timeout` set to `30min`: the maximum time to wait for WAL replication.


1. In RDS, navigate back to your [databases][databases], select the RDS instance to migrate and click `Modify`.
Expand Down
1 change: 0 additions & 1 deletion _partials/_migrate_prerequisites.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ Before you migrate your data:
[all available extensions]: /migrate/:currentVersion:/troubleshooting/#extension-availability
[tablespaces]: /migrate/:currentVersion:/troubleshooting/#tablespaces
[no-superuser-for-timescale-instance]: /migrate/:currentVersion:/troubleshooting/#superuser-privileges
[upgrade instructions]: /self-hosted/:currentVersion:/upgrades/about-upgrades/
[pg_hbaconf]: https://www.timescale.com/blog/5-common-connection-errors-in-postgresql-and-how-to-solve-them/#no-pg_hbaconf-entry-for-host
[create-ec2-instance]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance
[adjust-maintenance-window]: /use-timescale/:currentVersion:/upgrades/#adjusting-your-maintenance-window
36 changes: 36 additions & 0 deletions _partials/_migrate_self_postgres_check_versions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
<Procedure>

To see the versions of PostgreSQL and TimescaleDB running in a self-hosted database instance:

1. **Set your connection string**

This variable holds the connection information for the database to upgrade:

```bash
export SOURCE="postgres://<user>:<password>@<source host>:<source port>/<db_name>"
```

2. **Retrieve the version of PostgreSQL that you are running**
```shell
psql -X -d $SOURCE -c "SELECT version();"
```
PostgreSQL returns something like:
```shell
-----------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 17.2 (Ubuntu 17.2-1.pgdg22.04+1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 row)
```

1. **Retrieve the version of TimescaleDB that you are running**
```sql
psql -X -d $SOURCE -c "\dx timescaledb;"
```
PostgreSQL returns something like:
```shell
Name | Version | Schema | Description
-------------+---------+------------+---------------------------------------------------------------------
timescaledb | 2.17.2 | public | Enables scalable inserts and complex queries for time-series data
(1 row)
```

</Procedure>
37 changes: 37 additions & 0 deletions _partials/_migrate_self_postgres_implement_migration_path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
<Procedure>

You cannot upgrade TimescaleDB and PostgreSQL at the same time. You upgrade each product in
the following steps:

1. **Upgrade TimescaleDB**

```sql
psql -X -d $SOURCE -c "ALTER EXTENSION timescaledb UPDATE TO '<version number>';"
```

1. **If your migration path dictates it, upgrade PostgreSQL**

Follow the procedure in [Upgrade PostgreSQL][upgrade-pg]. The version of TimescaleDB installed
in your PostgreSQL deployment must be the same before and after the PostgreSQL upgrade.

1. **If your migration path dictates it, upgrade TimescaleDB again**

```sql
psql -X -d $SOURCE -c "ALTER EXTENSION timescaledb UPDATE TO '<version number>';"
```

1. **Check that you have upgraded to the correct version of TimescaleDB**

```sql
psql -X -d $SOURCE -c "\dx timescaledb;"
```
PostgreSQL returns something like:
```shell
Name | Version | Schema | Description
-------------+---------+--------+---------------------------------------------------------------------------------------
timescaledb | 2.17.2 | public | Enables scalable inserts and complex queries for time-series data (Community Edition)
```

</Procedure>

[upgrade-pg]: /self-hosted/:currentVersion:/upgrades/upgrade-pg/#upgrade-your-postgresql-instance
23 changes: 23 additions & 0 deletions _partials/_migrate_self_postgres_plan_migration_path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@

import SupportMatrix from "versionContent/_partials/_migrate_self_postgres_timescaledb_compatibility.mdx";

Best practice is to always use the latest version of TimescaleDB. Subscribe to our releases on GitHub or use Timescale
Cloud and always get latest update without any hassle.

Check the following support matrix against the versions of TimescaleDB and PostgreSQL that you are running currently
and the versions you want to update to, then choose your upgrade path.

For example, to upgrade from TimescaleDB 2.13 on PostgreSQL 13 to TimescaleDB 2.17.2 you need to:
1. Upgrade TimescaleDB to 2.16
1. Upgrade PostgreSQL to 14 or higher
1. Upgrade TimescaleDB to 2.17.2.

You may need to [upgrade to the latest PostgreSQL version][upgrade-pg] before you upgrade TimescaleDB. Also,
if you use [Timescale Toolkit][toolkit-install], ensure the `timescaledb_toolkit` extension is >=
v1.6.0 before you upgrade TimescaleDB extension.

<SupportMatrix />

[upgrade-pg]: /self-hosted/:currentVersion:/upgrades/upgrade-pg/#upgrade-your-postgresql-instance
[timescale-toolkit]:https://github.com/timescale/timescaledb-toolkit
[toolkit-install]: /self-hosted/:currentVersion:/tooling/install-toolkit/
Loading

0 comments on commit f66d0f3

Please sign in to comment.