You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/managing-data/core-concepts/partitions.md
+15-58
Original file line number
Diff line number
Diff line change
@@ -24,7 +24,6 @@ Partitioning can be enabled when a table is initially defined via the [PARTITION
24
24
25
25
To illustrate this, we [enhance](https://sql.clickhouse.com/?query=U0hPVyBDUkVBVEUgVEFCTEUgdWsudWtfcHJpY2VfcGFpZF9zaW1wbGVfcGFydGl0aW9uZWQ&run_query=true&tab=results) the [What are table parts](/parts) example table by adding a `PARTITION BY toStartOfMonth(date)` clause, which organizes the table`s data parts based on the months of property sales:
26
26
27
-
28
27
```sql
29
28
CREATETABLEuk.uk_price_paid_simple_partitioned
30
29
(
@@ -67,26 +66,15 @@ As sketched in the diagram above, parts belonging to different partitions are ne
67
66
68
67
You can [query](https://sql.clickhouse.com/?query=U0VMRUNUIERJU1RJTkNUIF9wYXJ0aXRpb25fdmFsdWUgQVMgcGFydGl0aW9uCkZST00gdWsudWtfcHJpY2VfcGFpZF9zaW1wbGVfcGFydGl0aW9uZWQKT1JERVIgQlkgcGFydGl0aW9uIEFTQw&run_query=true&tab=results) the list of all existing unique partitions of our example table by using the [virtual column](/engines/table-engines#table_engines-virtual_columns)`_partition_value`:
69
68
70
-
```sql
69
+
```sql runnable
71
70
SELECT DISTINCT _partition_value AS partition
72
71
FROMuk.uk_price_paid_simple_partitioned
73
72
ORDER BY partition ASC;
74
-
75
-
76
-
┌─partition──────┐
77
-
1. │ ('1995-01-01') │
78
-
2. │ ('1995-02-01') │
79
-
3. │ ('1995-03-01') │
80
-
...
81
-
304. │ ('2021-04-01') │
82
-
305. │ ('2021-05-01') │
83
-
306. │ ('2021-06-01') │
84
-
└────────────────┘
85
73
```
86
74
87
75
Alternatively, ClickHouse tracks all parts and partitions of all tables in the [system.parts](/operations/system-tables/parts) system table, and the following query [returns](https://sql.clickhouse.com/?query=U0VMRUNUCiAgICBwYXJ0aXRpb24sCiAgICBjb3VudCgpIEFTIHBhcnRzLAogICAgc3VtKHJvd3MpIEFTIHJvd3MKRlJPTSBzeXN0ZW0ucGFydHMKV0hFUkUgKGRhdGFiYXNlID0gJ3VrJykgQU5EIChgdGFibGVgID0gJ3VrX3ByaWNlX3BhaWRfc2ltcGxlX3BhcnRpdGlvbmVkJykgQU5EIGFjdGl2ZQpHUk9VUCBCWSBwYXJ0aXRpb24KT1JERVIgQlkgcGFydGl0aW9uIEFTQzs&run_query=true&tab=results) for our example table above the list of all partitions, plus the current number of active parts and the sum of rows in these parts per partition:
88
76
89
-
```sql
77
+
```sql runnable
90
78
SELECT
91
79
partition,
92
80
count() AS parts,
@@ -95,17 +83,6 @@ FROM system.parts
95
83
WHERE (database ='uk') AND (`table`='uk_price_paid_simple_partitioned') AND active
96
84
GROUP BY partition
97
85
ORDER BY partition ASC;
98
-
99
-
100
-
┌─partition──┬─parts─┬───rows─┐
101
-
1. │ 1995-01-01 │ 1 │ 50473 │
102
-
2. │ 1995-02-01 │ 1 │ 50840 │
103
-
3. │ 1995-03-01 │ 1 │ 71276 │
104
-
...
105
-
304. │ 2021-04-01 │ 3 │ 23160 │
106
-
305. │ 2021-05-01 │ 3 │ 17607 │
107
-
306. │ 2021-06-01 │ 3 │ 5652 │
108
-
└─partition──┴─parts─┴───rows─┘
109
86
```
110
87
111
88
@@ -152,20 +129,12 @@ TTL date + INTERVAL 12 MONTH TO VOLUME 'slow_but_cheap';
152
129
153
130
Partitions can assist with query performance, but this depends heavily on the access patterns. If queries target only a few partitions (ideally one), performance can potentially improve. This is only typically useful if the partitioning key is not in the primary key and you are filtering by it, as shown in the example query below.
154
131
155
-
```sql
132
+
```sql runnable
156
133
SELECTMAX(price) AS highest_price
157
-
FROM uk_price_paid_simple_partitioned
134
+
FROMuk.uk_price_paid_simple_partitioned
158
135
WHEREdate>='2020-12-01'
159
136
ANDdate<='2020-12-31'
160
137
AND town ='LONDON';
161
-
162
-
163
-
┌─highest_price─┐
164
-
1. │ 296280000 │ -- 296.28 million
165
-
└───────────────┘
166
-
167
-
1 row inset. Elapsed: 0.006 sec. Processed 8.19 thousand rows, 57.34 KB (1.36 million rows/s., 9.49 MB/s.)
168
-
Peak memory usage: 2.73 MiB.
169
138
```
170
139
171
140
The query runs over our example table from above and [calculates](https://sql.clickhouse.com/?query=U0VMRUNUIE1BWChwcmljZSkgQVMgaGlnaGVzdF9wcmljZQpGUk9NIHVrLnVrX3ByaWNlX3BhaWRfc2ltcGxlX3BhcnRpdGlvbmVkCldIRVJFIGRhdGUgPj0gJzIwMjAtMTItMDEnCiAgQU5EIGRhdGUgPD0gJzIwMjAtMTItMzEnCiAgQU5EIHRvd24gPSAnTE9ORE9OJzs&run_query=true&tab=results) the highest price of all sold properties in London in December 2020 by filtering on both a column (`date`) used in the table's partition key and on a column (`town`) used in the table's primary key (and `date` is not part of the primary key).
@@ -182,10 +151,10 @@ ClickHouse processes that query by applying a sequence of pruning techniques to
182
151
183
152
We can observe these data pruning steps by [inspecting](https://sql.clickhouse.com/?query=RVhQTEFJTiBpbmRleGVzID0gMQpTRUxFQ1QgTUFYKHByaWNlKSBBUyBoaWdoZXN0X3ByaWNlCkZST00gdWsudWtfcHJpY2VfcGFpZF9zaW1wbGVfcGFydGl0aW9uZWQKV0hFUkUgZGF0ZSA-PSAnMjAyMC0xMi0wMScKICBBTkQgZGF0ZSA8PSAnMjAyMC0xMi0zMScKICBBTkQgdG93biA9ICdMT05ET04nOw&run_query=true&tab=results) the physical query execution plan for our example query from above via an [EXPLAIN](/sql-reference/statements/explain) clause :
184
153
185
-
```sql
154
+
```sql style="fontSize:13px"
186
155
EXPLAIN indexes =1
187
156
SELECTMAX(price) AS highest_price
188
-
FROM uk_price_paid_simple_partitioned
157
+
FROMuk.uk_price_paid_simple_partitioned
189
158
WHEREdate>='2020-12-01'
190
159
ANDdate<='2020-12-31'
191
160
AND town ='LONDON';
@@ -240,37 +209,27 @@ With partitioning, the data is usually distributed across more data parts, which
240
209
241
210
We can demonstrate this by running the same query over both the [What are table parts](/parts) example table (without partitioning enabled), and our current example table from above (with partitioning enabled). Both tables [contain](https://sql.clickhouse.com/?query=U0VMRUNUCiAgICB0YWJsZSwKICAgIHN1bShyb3dzKSBBUyByb3dzCkZST00gc3lzdGVtLnBhcnRzCldIRVJFIChkYXRhYmFzZSA9ICd1aycpIEFORCAoYHRhYmxlYCBJTiBbJ3VrX3ByaWNlX3BhaWRfc2ltcGxlJywgJ3VrX3ByaWNlX3BhaWRfc2ltcGxlX3BhcnRpdGlvbmVkJ10pIEFORCBhY3RpdmUKR1JPVVAgQlkgdGFibGU7&run_query=true&tab=results) the same data and number of rows:
242
211
243
-
```sql
212
+
```sql runnable
244
213
SELECT
245
214
table,
246
215
sum(rows) AS rows
247
216
FROMsystem.parts
248
217
WHERE (database ='uk') AND (table IN ['uk_price_paid_simple', 'uk_price_paid_simple_partitioned']) AND active
However, the table with partitions enabled, [has](https://sql.clickhouse.com/?query=U0VMRUNUCiAgICB0YWJsZSwKICAgIGNvdW50KCkgQVMgcGFydHMKRlJPTSBzeXN0ZW0ucGFydHMKV0hFUkUgKGRhdGFiYXNlID0gJ3VrJykgQU5EIChgdGFibGVgIElOIFsndWtfcHJpY2VfcGFpZF9zaW1wbGUnLCAndWtfcHJpY2VfcGFpZF9zaW1wbGVfcGFydGl0aW9uZWQnXSkgQU5EIGFjdGl2ZQpHUk9VUCBCWSB0YWJsZTs&run_query=true&tab=results) more active [data parts](/parts), because, as mentioned above, ClickHouse only [merges](/parts) data parts within, but not across partitions:
258
222
259
-
```sql
223
+
```sql runnable
260
224
SELECT
261
225
table,
262
226
count() AS parts
263
227
FROMsystem.parts
264
228
WHERE (database ='uk') AND (table IN ['uk_price_paid_simple', 'uk_price_paid_simple_partitioned']) AND active
265
229
GROUP BY table;
266
230
267
-
268
-
┌─table────────────────────────────┬─parts─┐
269
-
1. │ uk_price_paid_simple │ 1 │
270
-
2. │ uk_price_paid_simple_partitioned │ 436 │
271
-
└──────────────────────────────────┴───────┘
272
231
```
273
-
As shown further above, the partitioned table `uk_price_paid_simple_partitioned` has 306 partitions, and therefore at least 306 active data parts. Whereas for our non-partitioned table `uk_price_paid_simple` all [initial](/parts) data parts could be merged into a single active part by background merges.
232
+
As shown further above, the partitioned table `uk_price_paid_simple_partitioned` has over 600 partitions, and therefore at 600 306 active data parts. Whereas for our non-partitioned table `uk_price_paid_simple` all [initial](/parts) data parts could be merged into a single active part by background merges.
274
233
275
234
276
235
When we [check](https://sql.clickhouse.com/?query=RVhQTEFJTiBpbmRleGVzID0gMQpTRUxFQ1QgTUFYKHByaWNlKSBBUyBoaWdoZXN0X3ByaWNlCkZST00gdWsudWtfcHJpY2VfcGFpZF9zaW1wbGVfcGFydGl0aW9uZWQKV0hFUkUgdG93biA9ICdMT05ET04nOw&run_query=true&tab=results) the physical query execution plan with an [EXPLAIN](/sql-reference/statements/explain) clause for our example query from above without the partition filter running over the partitioned table, we can see in row 19 and 20 of the output below that ClickHouse identified 671 out of 3257 existing [granules](/guides/best-practices/sparse-primary-indexes#data-is-organized-into-granules-for-parallel-data-processing) (blocks of rows) spread over 431 out of 436 existing active data parts that potentially contain rows matching the query's filter, and therefore will be scanned and processed by the query engine:
@@ -338,10 +297,9 @@ SELECT MAX(price) AS highest_price
338
297
FROMuk.uk_price_paid_simple_partitioned
339
298
WHERE town ='LONDON';
340
299
341
-
342
-
┌─highest_price─┐
343
-
1. │ 594300000 │ -- 594.30 million
344
-
└───────────────┘
300
+
┌─highest_price─┐
301
+
│ 594300000 │ -- 594.30 million
302
+
└───────────────┘
345
303
346
304
1 row inset. Elapsed: 0.090 sec. Processed 5.48 million rows, 27.95 MB (60.66 million rows/s., 309.51 MB/s.)
347
305
Peak memory usage: 163.44 MiB.
@@ -355,10 +313,9 @@ SELECT MAX(price) AS highest_price
355
313
FROMuk.uk_price_paid_simple
356
314
WHERE town ='LONDON';
357
315
358
-
359
-
┌─highest_price─┐
360
-
1. │ 594300000 │ -- 594.30 million
361
-
└───────────────┘
316
+
┌─highest_price─┐
317
+
│ 594300000 │ -- 594.30 million
318
+
└───────────────┘
362
319
363
320
1 row inset. Elapsed: 0.012 sec. Processed 1.97 million rows, 9.87 MB (162.23 million rows/s., 811.17 MB/s.)
0 commit comments