Merge branch 'latest' into add-guide-on-materialized-ctes

timescale · Dec 13, 2024 · f66d0f3 · f66d0f3
2 parents 2621119 + 5dd7ed7
commit f66d0f3
Show file tree

Hide file tree

Showing 50 changed files with 1,092 additions and 633 deletions.
diff --git a/_partials/_cookbook-hypertables.md b/_partials/_cookbook-hypertables.md
@@ -0,0 +1,67 @@
+
+## Hypertable recipes
+
+This section contains recipes about hypertables.
+
+### Remove duplicates from an existing hypertable
+
+Looking to remove duplicates from an existing hypertable? One method is to run a `PARTITION BY` query to get
+`ROW_NUMBER()` and then the `ctid` of rows where `row_number>1`. You then delete these rows.  However,
+you need to check `tableoid` and `ctid`. This is because `ctid` is not unique and might be duplicated in
+different chunks. The following code example took 17 hours to process a table with 40 million rows:
+
+```sql
+CREATE OR REPLACE FUNCTION deduplicate_chunks(ht_name TEXT, partition_columns TEXT, bot_id INT DEFAULT NULL)
+    RETURNS TABLE
+            (
+                chunk_schema  name,
+                chunk_name    name,
+                deleted_count INT
+            )
+AS
+$$
+DECLARE
+    chunk         RECORD;
+    where_clause  TEXT := '';
+    deleted_count INT;
+BEGIN
+    IF bot_id IS NOT NULL THEN
+        where_clause := FORMAT('WHERE bot_id = %s', bot_id);
+    END IF;
+
+    FOR chunk IN
+        SELECT c.chunk_schema, c.chunk_name
+        FROM timescaledb_information.chunks c
+        WHERE c.hypertable_name = ht_name
+        LOOP
+            EXECUTE FORMAT('
+            WITH cte AS (
+                SELECT ctid,
+                       ROW_NUMBER() OVER (PARTITION BY %s ORDER BY %s ASC) AS row_num,
+                       *
+                FROM %I.%I
+                %s
+            )
+            DELETE FROM %I.%I
+            WHERE ctid IN (
+                SELECT ctid
+                FROM cte
+                WHERE row_num > 1
+            )
+            RETURNING 1;
+        ', partition_columns, partition_columns, chunk.chunk_schema, chunk.chunk_name, where_clause, chunk.chunk_schema,
+                           chunk.chunk_name)
+                INTO deleted_count;
+
+            RETURN QUERY SELECT chunk.chunk_schema, chunk.chunk_name, COALESCE(deleted_count, 0);
+        END LOOP;
+END
+$$ LANGUAGE plpgsql;
+
+
+SELECT *
+FROM deduplicate_chunks('nudge_events', 'bot_id, session_id, nudge_id, time', 2540);
+```
+
+Shoutout to **Mathias Ose** and **Christopher Piggott** for this recipe. 
+
diff --git a/_partials/_cookbook-iot.md b/_partials/_cookbook-iot.md
@@ -0,0 +1,131 @@
+## IoT recipes
+
+This section contains recipes for IoT issues:
+
+### Work with columnar IoT data
+
+Narrow and medium width tables are a great way to store IoT data. A lot of reasons are outlined in
+[Designing Your Database Schema: Wide vs. Narrow Postgres Tables][blog-wide-vs-narrow].
+
+One of the key advantages of narrow tables is that the schema does not have to change when you add new
+sensors. Another big advantage is that each sensor can sample at different rates and times. This helps
+support things like hysteresis, where new values are written infrequently unless the value changes by a
+certain amount.
+
+#### Narrow table format example
+
+Working with narrow table data structures presents a few challenges. In the IoT world one concern is that
+many data analysis approaches - including machine learning as well as more traditional data analysis -
+require that your data is resampled and synchronized to a common time basis. Fortunately, TimescaleDB provides
+you with [hyperfunctions][hyperfunctions] and other tools to help you work with this data. 
+
+An example of a narrow table format is:
+
+| ts                      | sensor_id | value |
+|-------------------------|-----------|-------|
+| 2024-10-31 11:17:30.000 | 1007      | 23.45 |
+
+Typically you would couple this with a sensor table:
+
+| sensor_id | sensor_name  | units                    |
+|-----------|--------------|--------------------------|
+| 1007      | temperature  | degreesC                 |
+| 1012      | heat_mode    | on/off                   |
+| 1013      | cooling_mode | on/off                   | 
+| 1041      | occupancy    | number of people in room |
+
+A medium table retains the generic structure but adds columns of various types so that you can
+use the same table to store float, int, bool, or even JSON (jsonb) data:
+
+| ts                      | sensor_id | d     | i    | b    | t    | j    |
+|-------------------------|-----------|-------|------|------|------|------|
+| 2024-10-31 11:17:30.000 | 1007      | 23.45 | null | null | null | null |
+| 2024-10-31 11:17:47.000 | 1012      | null  | null | TRUE | null | null |
+| 2024-10-31 11:18:01.000 | 1041      | null  | 4    | null | null | null |
+
+To remove all-null entries, use an optional constraint such as:
+
+```sql
+    CONSTRAINT at_least_one_not_null
+        CHECK ((d IS NOT NULL) OR (i IS NOT NULL) OR (b IS NOT NULL) OR (j IS NOT NULL) OR (t IS NOT NULL))
+```
+
+#### Get the last value of every sensor
+
+There are several ways to get the latest value of every sensor. The following examples use the
+structure defined in [Narrow table format example][setup-a-narrow-table-format] as a reference:
+
+- [SELECT DISTINCT ON][select-distinct-on]
+- [JOIN LATERAL][join-lateral]
+
+##### SELECT DISTINCT ON
+
+If you have a list of sensors, the easy way to get the latest value of every sensor is to use 
+`SELECT DISTINCT ON`:
+
+```sql
+WITH latest_data AS (
+    SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
+    FROM iot_data
+    WHERE d is not null
+      AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week'  -- important
+    ORDER BY sensor_id, ts DESC
+)
+SELECT
+    sensor_id, sensors.name, ts, d
+FROM latest_data
+LEFT OUTER JOIN sensors ON latest_data.sensor_id = sensors.id
+WHERE latest_data.d is not null
+ORDER BY sensor_id, ts; -- Optional, for displaying results ordered by sensor_id
+```
+
+The common table expression (CTE) used above is not strictly necessary. However, it is an elegant way to join 
+to the sensor list to get a sensor name in the output.  If this is not something you care about,
+you can leave it out:
+
+```sql
+SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
+    FROM iot_data
+    WHERE d is not null
+      AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week'  -- important
+    ORDER BY sensor_id, ts DESC
+```
+
+It is important to take care when down-selecting this data. In the previous examples,
+the time that the query would scan back was limited. However, if there any sensors that have either 
+not reported in a long time or in the worst case, never reported, this query devolves to a full table scan.  
+In a database with 1000+ sensors and 41 million rows, an unconstrained query takes over an hour.
+
+#### JOIN LATERAL
+
+An alternative to [SELECT DISTINCT ON][select-distinct-on] is to use a `JOIN LATERAL`. By selecting your entire 
+sensor list from the sensors table rather than pulling the IDs out using `SELECT DISTINCT`, `JOIN LATERAL` can offer 
+some improvements in performance:
+
+```sql
+SELECT sensor_list.id, latest_data.ts, latest_data.d
+FROM sensors sensor_list
+    -- Add a WHERE clause here to downselect the sensor list, if you wish
+LEFT JOIN LATERAL (
+    SELECT ts, d
+    FROM iot_data raw_data
+    WHERE sensor_id = sensor_list.id
+    ORDER BY ts DESC
+    LIMIT 1
+) latest_data ON true
+WHERE latest_data.d is not null -- only pulling out float values ("d" column) in this example
+  AND latest_data.ts > CURRENT_TIMESTAMP - interval '1 week' -- important
+ORDER BY sensor_list.id, latest_data.ts;
+```
+
+Limiting the time range is important, especially if you have a lot of data. Best practice is to use these 
+kinds of queries for dashboards and quick status checks. To query over a much larger time range, encapsulate 
+the previous example into a materialized query that refreshes infrequently, perhaps once a day.
+
+Shoutout to **Christopher Piggott** for this recipe.
+
+[blog-wide-vs-narrow]: https://www.timescale.com/learn/designing-your-database-schema-wide-vs-narrow-postgres-tables
+[setup-a-narrow-table-format]: /tutorials/:currentVersion:/cookbook/#narrow-table-format-example
+[select-distinct-on]: /tutorials/:currentVersion:/cookbook/#select-distinct-on
+[join-lateral]: /tutorials/:currentVersion:/cookbook/#join-lateral
+[hyperfunctions]: /use-timescale/:currentVersion:/hyperfunctions/
diff --git a/_partials/_install-self-hosted-docker-based.mdx b/_partials/_install-self-hosted-docker-based.mdx
@@ -12,7 +12,7 @@ In Terminal:
     [TimescaleDB Toolkit](https://github.com/timescale/timescaledb-toolkit), and support for PostGIS and Patroni.
     The lighter-weight `timescale/timescaledb:latest-pg17` non-ha image uses [Alpine][alpine].
 
-    TimescaleDB is pre-created in the default Postgres database in both the -ha and non-ha docker images.
+    TimescaleDB is pre-created in the default PostgreSQL database in both the -ha and non-ha docker images.
     By default, TimescaleDB is added to any new database you create in these images.
 
 1. **Run the container**

diff --git a/_partials/_migrate_dump_postgresql.md b/_partials/_migrate_dump_postgresql.md
@@ -1,6 +1,6 @@
 import MigrationSetupFirstSteps from "versionContent/_partials/_migrate_set_up_database_first_steps.mdx";
 import MigrationSetupDBConnectionPostgresql from "versionContent/_partials/_migrate_set_up_align_db_extensions_postgres_based.mdx";
-import MigrationProcedureDumpSchemaPostgres from "versionContent/_partials/_migrate_dump_roles_schema_data_postgres.mdx";
+import MigrationProcedureDumpSchemaPostgreSQL from "versionContent/_partials/_migrate_dump_roles_schema_data_postgres.mdx";
 import MigrationValidateRestartApp from "versionContent/_partials/_migrate_validate_and_restart_app.mdx";
 
 ## Prepare to migrate

diff --git a/_partials/_migrate_install_psql_ec2_instance.md b/_partials/_migrate_install_psql_ec2_instance.md
@@ -60,7 +60,7 @@
    ```sh
    export SOURCE="postgres://<Master username>:<Master password>@<Endpoint>:<Port>/<DB name>"
    ```
-   The value of `Master password` was supplied when this Postgres RDS instance was created.
+   The value of `Master password` was supplied when this PostgreSQL RDS instance was created.
 
 1. Test your connection:
    ```sh
@@ -71,4 +71,4 @@
 </Procedure>
 
 [about-hypertables]: /use-timescale/:currentVersion:/hypertables/about-hypertables/
-[data-compression]: /use-timescale/:currentVersion:/compression/about-compression/
+[data-compression]: /use-timescale/:currentVersion:/compression/about-compression/
diff --git a/_partials/_migrate_live_migrate_faq_all.md b/_partials/_migrate_live_migrate_faq_all.md
@@ -90,10 +90,10 @@ This rare phenomenon may happen when:
   following GUCs to the recommended values on the source RDS instance.
 
    ```shell
-   psql -X -d $SOURCE -c 'alter system set tcp_keepalives_count=60'
-   psql -X -d $SOURCE -c 'alter system set tcp_keepalives_idle=10'
-   psql -X -d $SOURCE -c 'alter system set tcp_keepalives_interval=10'
-   psql -X -d $SOURCE -c 'alter system set wal_sender_timeout=30m'
+   psql -X -d $SOURCE -c "alter system set tcp_keepalives_count=60"
+   psql -X -d $SOURCE -c "alter system set tcp_keepalives_idle=10"
+   psql -X -d $SOURCE -c "alter system set tcp_keepalives_interval=10"
+   psql -X -d $SOURCE -c "alter system set wal_sender_timeout='30min'"
    ```
 
   For more information, see [https://github.com/dimitri/pgcopydb/issues/773#issuecomment-2139093365](https://github.com/dimitri/pgcopydb/issues/773#issuecomment-2139093365)

diff --git a/_partials/_migrate_live_setup_connection_strings.md b/_partials/_migrate_live_setup_connection_strings.md
@@ -2,8 +2,8 @@ These variables hold the connection information for the source database and targ
 In Terminal on your migration machine, set the following:
 
 ```bash
-export SOURCE=postgres://<user>:<password>@<source host>:<source port>/<db_name>
-export TARGET=postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require
+export SOURCE="postgres://<user>:<password>@<source host>:<source port>/<db_name>"
+export TARGET="postgres://tsdbadmin:<PASSWORD>@<HOST>:<PORT>/tsdb?sslmode=require"
 ```
 You find the connection information for your Timescale Cloud service in the configuration file you
-downloaded when you created the service.
+downloaded when you created the service.
diff --git a/_partials/_migrate_live_tune_source_database.md b/_partials/_migrate_live_tune_source_database.md
@@ -18,10 +18,10 @@ a managed service, follow the instructions in the `From MST` tab on this page.
 
 1. **Tune system messaging**
    ```shell
-   psql -X -d $SOURCE -c 'alter system set tcp_keepalives_count=60'
-   psql -X -d $SOURCE -c 'alter system set tcp_keepalives_idle=10'
-   psql -X -d $SOURCE -c 'alter system set tcp_keepalives_interval=10'
-   psql -X -d $SOURCE -c 'alter system set wal_sender_timeout=30m'
+   psql -X -d $SOURCE -c "alter system set tcp_keepalives_count=60"
+   psql -X -d $SOURCE -c "alter system set tcp_keepalives_idle=10"
+   psql -X -d $SOURCE -c "alter system set tcp_keepalives_interval=10"
+   psql -X -d $SOURCE -c "alter system set wal_sender_timeout='30min'"
    ```
 
 1. **Restart the source database** 
@@ -33,4 +33,4 @@ a managed service, follow the instructions in the `From MST` tab on this page.
 
    <EnableReplication />
 
-[install-wal2json]: https://github.com/eulerto/wal2json
+[install-wal2json]: https://github.com/eulerto/wal2json
diff --git a/_partials/_migrate_live_tune_source_database_awsrds.md b/_partials/_migrate_live_tune_source_database_awsrds.md
@@ -25,7 +25,7 @@ Updating parameters on a PostgreSQL instance will cause an outage. Choose a time
       - `tcp_keepalives_count` set to `60`:  the number of messages that can be lost before the client is considered dead.
       - `tcp_keepalives_idle` set to `10`: the amount of time with no network activity before the IS sends a TCP keepalive message to the client.
       - `tcp_keepalives_interval` set to `10`: the amount of time before a unacknowledged TCP keepalive message is restransmitted. 
-      - `wal_sender_timeout` set to `30m`: the maximum time to wait for WAL replication.
+      - `wal_sender_timeout` set to `30min`: the maximum time to wait for WAL replication.
 
 
    1. In RDS, navigate back to your [databases][databases], select the RDS instance to migrate and click `Modify`.  

diff --git a/_partials/_migrate_prerequisites.md b/_partials/_migrate_prerequisites.md
@@ -19,7 +19,6 @@ Before you migrate your data:
 [all available extensions]: /migrate/:currentVersion:/troubleshooting/#extension-availability
 [tablespaces]: /migrate/:currentVersion:/troubleshooting/#tablespaces
 [no-superuser-for-timescale-instance]: /migrate/:currentVersion:/troubleshooting/#superuser-privileges
-[upgrade instructions]: /self-hosted/:currentVersion:/upgrades/about-upgrades/
 [pg_hbaconf]: https://www.timescale.com/blog/5-common-connection-errors-in-postgresql-and-how-to-solve-them/#no-pg_hbaconf-entry-for-host
 [create-ec2-instance]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EC2_GetStarted.html#ec2-launch-instance
 [adjust-maintenance-window]: /use-timescale/:currentVersion:/upgrades/#adjusting-your-maintenance-window
diff --git a/_partials/_migrate_self_postgres_check_versions.md b/_partials/_migrate_self_postgres_check_versions.md
@@ -0,0 +1,36 @@
+<Procedure>
+
+To see the versions of PostgreSQL and TimescaleDB running in a self-hosted database instance: 
+
+1. **Set your connection string**
+
+   This variable holds the connection information for the database to upgrade:
+
+   ```bash
+   export SOURCE="postgres://<user>:<password>@<source host>:<source port>/<db_name>"
+   ```
+
+2. **Retrieve the version of PostgreSQL that you are running**
+    ```shell
+    psql -X -d $SOURCE -c "SELECT version();"
+    ```
+   PostgreSQL returns something like:
+    ```shell
+    -----------------------------------------------------------------------------------------------------------------------------------------
+    PostgreSQL 17.2 (Ubuntu 17.2-1.pgdg22.04+1) on aarch64-unknown-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
+    (1 row)
+    ```
+
+1. **Retrieve the version of TimescaleDB that you are running**
+    ```sql
+    psql -X -d $SOURCE -c "\dx timescaledb;"
+    ```
+   PostgreSQL returns something like:
+    ```shell
+        Name     | Version |   Schema   |                             Description
+    -------------+---------+------------+---------------------------------------------------------------------
+    timescaledb | 2.17.2   | public     | Enables scalable inserts and complex queries for time-series data
+    (1 row)
+    ```     
+
+</Procedure> 
diff --git a/_partials/_migrate_self_postgres_implement_migration_path.md b/_partials/_migrate_self_postgres_implement_migration_path.md
@@ -0,0 +1,37 @@
+<Procedure>
+
+You cannot upgrade TimescaleDB and PostgreSQL at the same time. You upgrade each product in
+the following steps:
+
+1. **Upgrade TimescaleDB**
+
+    ```sql
+   psql -X -d $SOURCE -c "ALTER EXTENSION timescaledb UPDATE TO '<version number>';"
+   ```
+
+1. **If your migration path dictates it, upgrade PostgreSQL**
+
+   Follow the procedure in [Upgrade PostgreSQL][upgrade-pg]. The version of TimescaleDB installed
+   in your PostgreSQL deployment must be the same before and after the PostgreSQL upgrade.
+
+1. **If your migration path dictates it, upgrade TimescaleDB again**
+
+    ```sql
+   psql -X -d $SOURCE -c "ALTER EXTENSION timescaledb UPDATE TO '<version number>';"
+   ```
+
+1. **Check that you have upgraded to the correct version of TimescaleDB**
+
+    ```sql
+    psql -X -d $SOURCE -c "\dx timescaledb;"
+    ```
+    PostgreSQL returns something like:
+    ```shell
+    Name     | Version | Schema |                                      Description                                      
+    -------------+---------+--------+---------------------------------------------------------------------------------------
+    timescaledb | 2.17.2  | public | Enables scalable inserts and complex queries for time-series data (Community Edition)
+    ```
+
+</Procedure> 
+
+[upgrade-pg]: /self-hosted/:currentVersion:/upgrades/upgrade-pg/#upgrade-your-postgresql-instance
diff --git a/_partials/_migrate_self_postgres_plan_migration_path.md b/_partials/_migrate_self_postgres_plan_migration_path.md
@@ -0,0 +1,23 @@
+
+import SupportMatrix from "versionContent/_partials/_migrate_self_postgres_timescaledb_compatibility.mdx";
+
+Best practice is to always use the latest version of TimescaleDB. Subscribe to our releases on GitHub or use Timescale
+Cloud and always get latest update without any hassle. 
+
+Check the following support matrix against the versions of TimescaleDB and PostgreSQL that you are running currently 
+and the versions you want to update to, then choose your upgrade path.
+
+For example, to upgrade from TimescaleDB 2.13 on PostgreSQL 13 to TimescaleDB 2.17.2 you need to:
+1. Upgrade TimescaleDB to 2.16
+1. Upgrade PostgreSQL to 14 or higher
+1. Upgrade TimescaleDB to 2.17.2.
+
+You may need to [upgrade to the latest PostgreSQL version][upgrade-pg] before you upgrade TimescaleDB. Also,
+if you use [Timescale Toolkit][toolkit-install], ensure the `timescaledb_toolkit` extension is >=  
+v1.6.0 before you upgrade TimescaleDB extension.
+
+<SupportMatrix />
+
+[upgrade-pg]: /self-hosted/:currentVersion:/upgrades/upgrade-pg/#upgrade-your-postgresql-instance
+[timescale-toolkit]:https://github.com/timescale/timescaledb-toolkit
+[toolkit-install]: /self-hosted/:currentVersion:/tooling/install-toolkit/