Skip to content

Commit d6ac65c

Browse files
authored
Merge pull request #3654 from ClickHouse/uk_postgres
fix uk, projections and postgres
2 parents 8ccb531 + 00fc8c0 commit d6ac65c

File tree

7 files changed

+224
-342
lines changed

7 files changed

+224
-342
lines changed

docs/best-practices/partionning_keys.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ With partitioning enabled, ClickHouse only [merges](/merges) data parts within,
4444

4545
<Image img={merges_with_partitions} size="md" alt="Partitions" />
4646

47-
## Applications of partitioning {#applications-of-partionning}
47+
## Applications of partitioning {#applications-of-partitioning}
4848

4949
Partitioning is a powerful tool for managing large datasets in ClickHouse, especially in observability and analytics use cases. It enables efficient data life cycle operations by allowing entire partitions, often aligned with time or business logic, to be dropped, moved, or archived in a single metadata operation. This is significantly faster and less resource-intensive than row-level delete or copy operations. Partitioning also integrates cleanly with ClickHouse features like TTL and tiered storage, making it possible to implement retention policies or hot/cold storage strategies without custom orchestration. For example, recent data can be kept on fast SSD-backed storage, while older partitions are automatically moved to cheaper object storage.
5050

docs/data-modeling/projections.md

Lines changed: 168 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,44 @@ read as shown in the figure below:
4747

4848
<Image img={projections_1} size="lg" alt="Projections in ClickHouse"/>
4949

50+
## When to use Projections? {#when-to-use-projections}
51+
52+
Projections are an appealing feature for new users as they are automatically
53+
maintained as data is inserted. Furthermore, queries can just be sent to a
54+
single table where the projections are exploited where possible to speed up
55+
the response time.
56+
57+
This is in contrast to Materialized Views, where the user has to select the
58+
appropriate optimized target table or rewrite their query, depending on the
59+
filters. This places greater emphasis on user applications and increases
60+
client-side complexity.
61+
62+
Despite these advantages, projections come with some inherent limitations which
63+
users should be aware of and thus should be deployed sparingly.
64+
65+
- Projections don't allow using different TTL for the source table and the
66+
(hidden) target table, materialized views allow different TTLs.
67+
- Projections don't currently support `optimize_read_in_order` for the (hidden)
68+
target table.
69+
- Lightweight updates and deletes are not supported for tables with projections.
70+
- Materialized Views can be chained: the target table of one Materialized View
71+
can be the source table of another Materialized View, and so on. This is not
72+
possible with projections.
73+
- Projections don't support joins, but Materialized Views do.
74+
- Projections don't support filters (`WHERE` clause), but Materialized Views do.
75+
76+
We recommend using projections when:
77+
78+
- A complete re-ordering of the data is required. While the expression in the
79+
projection can, in theory, use a `GROUP BY,` materialized views are more
80+
effective for maintaining aggregates. The query optimizer is also more likely
81+
to exploit projections that use a simple reordering, i.e., `SELECT * ORDER BY x`.
82+
Users can select a subset of columns in this expression to reduce storage
83+
footprint.
84+
- Users are comfortable with the associated increase in storage footprint and
85+
overhead of writing data twice. Test the impact on insertion speed and
86+
[evaluate the storage overhead](/data-compression/compression-in-clickhouse).
87+
5088
## Examples {#examples}
5189

5290
### Filtering on columns which aren't in the primary key {#filtering-without-using-primary-keys}
@@ -338,43 +376,142 @@ projections: ['uk.uk_price_paid_with_projections.prj_obj_town_price']
338376
2 rows in set. Elapsed: 0.006 sec.
339377
```
340378

341-
## When to use Projections? {#when-to-use-projections}
379+
### Further examples {#further-examples}
342380

343-
Projections are an appealing feature for new users as they are automatically
344-
maintained as data is inserted. Furthermore, queries can just be sent to a
345-
single table where the projections are exploited where possible to speed up
346-
the response time.
381+
The following examples use the same UK price dataset, contrasting queries with and without projections.
347382

348-
This is in contrast to Materialized Views, where the user has to select the
349-
appropriate optimized target table or rewrite their query, depending on the
350-
filters. This places greater emphasis on user applications and increases
351-
client-side complexity.
383+
In order to preserve our original table (and performance), we again create a copy of the table using `CREATE AS` and `INSERT INTO SELECT`.
352384

353-
Despite these advantages, projections come with some inherent limitations which
354-
users should be aware of and thus should be deployed sparingly.
385+
```sql
386+
CREATE TABLE uk.uk_price_paid_with_projections_v2 AS uk.uk_price_paid;
387+
INSERT INTO uk.uk_price_paid_with_projections_v2 SELECT * FROM uk.uk_price_paid;
388+
```
355389

356-
- Projections don't allow using different TTL for the source table and the
357-
(hidden) target table, materialized views allow different TTLs.
358-
- Projections don't currently support `optimize_read_in_order` for the (hidden)
359-
target table.
360-
- Lightweight updates and deletes are not supported for tables with projections.
361-
- Materialized Views can be chained: the target table of one Materialized View
362-
can be the source table of another Materialized View, and so on. This is not
363-
possible with projections.
364-
- Projections don't support joins, but Materialized Views do.
365-
- Projections don't support filters (`WHERE` clause), but Materialized Views do.
390+
#### Build a Projection {#build-projection}
366391

367-
We recommend using projections when:
392+
Let's create an aggregate projection by the dimensions `toYear(date)`, `district`, and `town`:
393+
394+
```sql
395+
ALTER TABLE uk.uk_price_paid_with_projections_v2
396+
ADD PROJECTION projection_by_year_district_town
397+
(
398+
SELECT
399+
toYear(date),
400+
district,
401+
town,
402+
avg(price),
403+
sum(price),
404+
count()
405+
GROUP BY
406+
toYear(date),
407+
district,
408+
town
409+
)
410+
```
411+
412+
Populate the projection for existing data. (Without materializing it, the projection will be created for only newly inserted data):
413+
414+
```sql
415+
ALTER TABLE uk.uk_price_paid_with_projections_v2
416+
MATERIALIZE PROJECTION projection_by_year_district_town
417+
SETTINGS mutations_sync = 1
418+
```
419+
420+
The following queries contrast performance with and without projections. To disable projection use we use the setting [`optimize_use_projections`](/operations/settings/settings#optimize_use_projections), which is enabled by default.
421+
422+
#### Query 1. Average price per year {#average-price-projections}
423+
424+
```sql runnable
425+
SELECT
426+
toYear(date) AS year,
427+
round(avg(price)) AS price,
428+
bar(price, 0, 1000000, 80)
429+
FROM uk.uk_price_paid_with_projections_v2
430+
GROUP BY year
431+
ORDER BY year ASC
432+
SETTINGS optimize_use_projections=0
433+
```
434+
435+
```sql runnable
436+
SELECT
437+
toYear(date) AS year,
438+
round(avg(price)) AS price,
439+
bar(price, 0, 1000000, 80)
440+
FROM uk.uk_price_paid_with_projections_v2
441+
GROUP BY year
442+
ORDER BY year ASC
443+
444+
```
445+
The results should be the same, but the performance better on the latter example!
446+
447+
448+
#### Query 2. Average price per year in London {#average-price-london-projections}
449+
450+
```sql runnable
451+
SELECT
452+
toYear(date) AS year,
453+
round(avg(price)) AS price,
454+
bar(price, 0, 2000000, 100)
455+
FROM uk.uk_price_paid_with_projections_v2
456+
WHERE town = 'LONDON'
457+
GROUP BY year
458+
ORDER BY year ASC
459+
SETTINGS optimize_use_projections=0
460+
```
461+
462+
463+
```sql runnable
464+
SELECT
465+
toYear(date) AS year,
466+
round(avg(price)) AS price,
467+
bar(price, 0, 2000000, 100)
468+
FROM uk.uk_price_paid_with_projections_v2
469+
WHERE town = 'LONDON'
470+
GROUP BY year
471+
ORDER BY year ASC
472+
```
473+
474+
#### Query 3. The most expensive neighborhoods {#most-expensive-neighborhoods-projections}
475+
476+
The condition (date >= '2020-01-01') needs to be modified so that it matches the projection dimension (`toYear(date) >= 2020)`:
477+
478+
```sql runnable
479+
SELECT
480+
town,
481+
district,
482+
count() AS c,
483+
round(avg(price)) AS price,
484+
bar(price, 0, 5000000, 100)
485+
FROM uk.uk_price_paid_with_projections_v2
486+
WHERE toYear(date) >= 2020
487+
GROUP BY
488+
town,
489+
district
490+
HAVING c >= 100
491+
ORDER BY price DESC
492+
LIMIT 100
493+
SETTINGS optimize_use_projections=0
494+
```
495+
496+
```sql runnable
497+
SELECT
498+
town,
499+
district,
500+
count() AS c,
501+
round(avg(price)) AS price,
502+
bar(price, 0, 5000000, 100)
503+
FROM uk.uk_price_paid_with_projections_v2
504+
WHERE toYear(date) >= 2020
505+
GROUP BY
506+
town,
507+
district
508+
HAVING c >= 100
509+
ORDER BY price DESC
510+
LIMIT 100
511+
```
512+
513+
Again, the result is the same but notice the improvement in query performance for the 2nd query.
368514

369-
- A complete re-ordering of the data is required. While the expression in the
370-
projection can, in theory, use a `GROUP BY,` materialized views are more
371-
effective for maintaining aggregates. The query optimizer is also more likely
372-
to exploit projections that use a simple reordering, i.e., `SELECT * ORDER BY x`.
373-
Users can select a subset of columns in this expression to reduce storage
374-
footprint.
375-
- Users are comfortable with the associated increase in storage footprint and
376-
overhead of writing data twice. Test the impact on insertion speed and
377-
[evaluate the storage overhead](/data-compression/compression-in-clickhouse).
378515

379516
## Related content {#related-content}
380517
- [A Practical Introduction to Primary Indexes in ClickHouse](/guides/best-practices/sparse-primary-indexes#option-3-projections)

0 commit comments

Comments
 (0)