Skip to content

Commit 0afb88f

Browse files
billy-the-fishatovpekoJônatas Davi Paganini
authored
chore: add first cookbook recipe. (timescale#3550)
* chore: add first cookbook recipe. Co-authored-by: atovpeko <[email protected]> Co-authored-by: Jônatas Davi Paganini <[email protected]>
1 parent 20262e0 commit 0afb88f

File tree

4 files changed

+236
-0
lines changed

4 files changed

+236
-0
lines changed

_partials/_cookbook-hypertables.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
2+
## Hypertable recipes
3+
4+
This section contains recipes about hypertables.
5+
6+
### Remove duplicates from an existing hypertable
7+
8+
Looking to remove duplicates from an existing hypertable? One method is to run a `PARTITION BY` query to get
9+
`ROW_NUMBER()` and then the `ctid` of rows where `row_number>1`. You then delete these rows. However,
10+
you need to check `tableoid` and `ctid`. This is because `ctid` is not unique and might be duplicated in
11+
different chunks. The following code example took 17 hours to process a table with 40 million rows:
12+
13+
```sql
14+
CREATE OR REPLACE FUNCTION deduplicate_chunks(ht_name TEXT, partition_columns TEXT, bot_id INT DEFAULT NULL)
15+
RETURNS TABLE
16+
(
17+
chunk_schema name,
18+
chunk_name name,
19+
deleted_count INT
20+
)
21+
AS
22+
$$
23+
DECLARE
24+
chunk RECORD;
25+
where_clause TEXT := '';
26+
deleted_count INT;
27+
BEGIN
28+
IF bot_id IS NOT NULL THEN
29+
where_clause := FORMAT('WHERE bot_id = %s', bot_id);
30+
END IF;
31+
32+
FOR chunk IN
33+
SELECT c.chunk_schema, c.chunk_name
34+
FROM timescaledb_information.chunks c
35+
WHERE c.hypertable_name = ht_name
36+
LOOP
37+
EXECUTE FORMAT('
38+
WITH cte AS (
39+
SELECT ctid,
40+
ROW_NUMBER() OVER (PARTITION BY %s ORDER BY %s ASC) AS row_num,
41+
*
42+
FROM %I.%I
43+
%s
44+
)
45+
DELETE FROM %I.%I
46+
WHERE ctid IN (
47+
SELECT ctid
48+
FROM cte
49+
WHERE row_num > 1
50+
)
51+
RETURNING 1;
52+
', partition_columns, partition_columns, chunk.chunk_schema, chunk.chunk_name, where_clause, chunk.chunk_schema,
53+
chunk.chunk_name)
54+
INTO deleted_count;
55+
56+
RETURN QUERY SELECT chunk.chunk_schema, chunk.chunk_name, COALESCE(deleted_count, 0);
57+
END LOOP;
58+
END
59+
$$ LANGUAGE plpgsql;
60+
61+
62+
SELECT *
63+
FROM deduplicate_chunks('nudge_events', 'bot_id, session_id, nudge_id, time', 2540);
64+
```
65+
66+
Shoutout to **Mathias Ose** and **Christopher Piggott** for this recipe.
67+

_partials/_cookbook-iot.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
## IoT recipes
2+
3+
This section contains recipes for IoT issues:
4+
5+
### Work with columnar IoT data
6+
7+
Narrow and medium width tables are a great way to store IoT data. A lot of reasons are outlined in
8+
[Designing Your Database Schema: Wide vs. Narrow Postgres Tables][blog-wide-vs-narrow].
9+
10+
One of the key advantages of narrow tables is that the schema does not have to change when you add new
11+
sensors. Another big advantage is that each sensor can sample at different rates and times. This helps
12+
support things like hysteresis, where new values are written infrequently unless the value changes by a
13+
certain amount.
14+
15+
#### Narrow table format example
16+
17+
Working with narrow table data structures presents a few challenges. In the IoT world one concern is that
18+
many data analysis approaches - including machine learning as well as more traditional data analysis -
19+
require that your data is resampled and synchronized to a common time basis. Fortunately, TimescaleDB provides
20+
you with [hyperfunctions][hyperfunctions] and other tools to help you work with this data.
21+
22+
An example of a narrow table format is:
23+
24+
| ts | sensor_id | value |
25+
|-------------------------|-----------|-------|
26+
| 2024-10-31 11:17:30.000 | 1007 | 23.45 |
27+
28+
Typically you would couple this with a sensor table:
29+
30+
| sensor_id | sensor_name | units |
31+
|-----------|--------------|--------------------------|
32+
| 1007 | temperature | degreesC |
33+
| 1012 | heat_mode | on/off |
34+
| 1013 | cooling_mode | on/off |
35+
| 1041 | occupancy | number of people in room |
36+
37+
A medium table retains the generic structure but adds columns of various types so that you can
38+
use the same table to store float, int, bool, or even JSON (jsonb) data:
39+
40+
| ts | sensor_id | d | i | b | t | j |
41+
|-------------------------|-----------|-------|------|------|------|------|
42+
| 2024-10-31 11:17:30.000 | 1007 | 23.45 | null | null | null | null |
43+
| 2024-10-31 11:17:47.000 | 1012 | null | null | TRUE | null | null |
44+
| 2024-10-31 11:18:01.000 | 1041 | null | 4 | null | null | null |
45+
46+
To remove all-null entries, use an optional constraint such as:
47+
48+
```sql
49+
CONSTRAINT at_least_one_not_null
50+
CHECK ((d IS NOT NULL) OR (i IS NOT NULL) OR (b IS NOT NULL) OR (j IS NOT NULL) OR (t IS NOT NULL))
51+
```
52+
53+
#### Get the last value of every sensor
54+
55+
There are several ways to get the latest value of every sensor. The following examples use the
56+
structure defined in [Narrow table format example][setup-a-narrow-table-format] as a reference:
57+
58+
- [SELECT DISTINCT ON][select-distinct-on]
59+
- [JOIN LATERAL][join-lateral]
60+
61+
##### SELECT DISTINCT ON
62+
63+
If you have a list of sensors, the easy way to get the latest value of every sensor is to use
64+
`SELECT DISTINCT ON`:
65+
66+
```sql
67+
WITH latest_data AS (
68+
SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
69+
FROM iot_data
70+
WHERE d is not null
71+
AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important
72+
ORDER BY sensor_id, ts DESC
73+
)
74+
SELECT
75+
sensor_id, sensors.name, ts, d
76+
FROM latest_data
77+
LEFT OUTER JOIN sensors ON latest_data.sensor_id = sensors.id
78+
WHERE latest_data.d is not null
79+
ORDER BY sensor_id, ts; -- Optional, for displaying results ordered by sensor_id
80+
```
81+
82+
The common table expression (CTE) used above is not strictly necessary. However, it is an elegant way to join
83+
to the sensor list to get a sensor name in the output. If this is not something you care about,
84+
you can leave it out:
85+
86+
```sql
87+
SELECT DISTINCT ON (sensor_id) ts, sensor_id, d
88+
FROM iot_data
89+
WHERE d is not null
90+
AND ts > CURRENT_TIMESTAMP - INTERVAL '1 week' -- important
91+
ORDER BY sensor_id, ts DESC
92+
```
93+
94+
It is important to take care when down-selecting this data. In the previous examples,
95+
the time that the query would scan back was limited. However, if there any sensors that have either
96+
not reported in a long time or in the worst case, never reported, this query devolves to a full table scan.
97+
In a database with 1000+ sensors and 41 million rows, an unconstrained query takes over an hour.
98+
99+
#### JOIN LATERAL
100+
101+
An alternative to [SELECT DISTINCT ON][select-distinct-on] is to use a `JOIN LATERAL`. By selecting your entire
102+
sensor list from the sensors table rather than pulling the IDs out using `SELECT DISTINCT`, `JOIN LATERAL` can offer
103+
some improvements in performance:
104+
105+
```sql
106+
SELECT sensor_list.id, latest_data.ts, latest_data.d
107+
FROM sensors sensor_list
108+
-- Add a WHERE clause here to downselect the sensor list, if you wish
109+
LEFT JOIN LATERAL (
110+
SELECT ts, d
111+
FROM iot_data raw_data
112+
WHERE sensor_id = sensor_list.id
113+
ORDER BY ts DESC
114+
LIMIT 1
115+
) latest_data ON true
116+
WHERE latest_data.d is not null -- only pulling out float values ("d" column) in this example
117+
AND latest_data.ts > CURRENT_TIMESTAMP - interval '1 week' -- important
118+
ORDER BY sensor_list.id, latest_data.ts;
119+
```
120+
121+
Limiting the time range is important, especially if you have a lot of data. Best practice is to use these
122+
kinds of queries for dashboards and quick status checks. To query over a much larger time range, encapsulate
123+
the previous example into a materialized query that refreshes infrequently, perhaps once a day.
124+
125+
Shoutout to **Christopher Piggott** for this recipe.
126+
127+
[blog-wide-vs-narrow]: https://www.timescale.com/learn/designing-your-database-schema-wide-vs-narrow-postgres-tables
128+
[setup-a-narrow-table-format]: /tutorials/:currentVersion:/cookbook/#narrow-table-format-example
129+
[select-distinct-on]: /tutorials/:currentVersion:/cookbook/#select-distinct-on
130+
[join-lateral]: /tutorials/:currentVersion:/cookbook/#join-lateral
131+
[hyperfunctions]: /use/:currentVersion:/hyperfunctions/

tutorials/cookbook.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
---
2+
title: Timescale cookbook
3+
excerpt: Code examples from the community that help you with loads of common conundrums.
4+
product: [cloud, mst, self_hosted]
5+
---
6+
7+
import Hypertables from "versionContent/_partials/_cookbook-hypertables.mdx";
8+
import IOT from "versionContent/_partials/_cookbook-iot.mdx";
9+
10+
11+
# Timescale community cookbook
12+
13+
This page contains suggestions from the [TimescaleDB Community](https://timescaledb.slack.com/) about how to resolve
14+
common issues. Use these code examples as guidance to work with your own data.
15+
16+
17+
## Prerequisites
18+
19+
To follow the examples in this page, you need a:
20+
21+
- [Target Timescale Cloud service][create-a-service]
22+
- [Connection to your service][connect-to-service]
23+
24+
25+
26+
<Hypertables />
27+
28+
<IOT />
29+
30+
31+
32+
[create-a-service]: /getting-started/:currentVersion:/services/#create-a-timescale-cloud-service
33+
[connect-to-service]: /getting-started/:currentVersion:/run-queries-from-console/

tutorials/page-index/page-index.js

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,11 @@ module.exports = [
154154
href: "simulate-iot-sensor-data",
155155
excerpt: "Simulate and query an IoT sensor dataset",
156156
},
157+
{
158+
title: "Timescale community cookbook",
159+
href: "cookbook",
160+
excerpt: "Code examples from the community that help you with loads of common conundrums.",
161+
},
157162
],
158163
},
159164
];

0 commit comments

Comments
 (0)