Skip to content

Commit a107725

Browse files
authored
Merge branch 'pingcap:master' into master
2 parents b2ebd64 + 0f5643a commit a107725

18 files changed

+344
-86
lines changed

OWNERS

Lines changed: 8 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,35 +1,8 @@
1-
# See the OWNERS docs at https://go.k8s.io/owners
2-
approvers:
3-
- breezewish
4-
- csuzhangxc
5-
- hfxsd
6-
- Icemap
7-
- jackysp
8-
- kissmydb
9-
- lance6716
10-
- lilin90
11-
- Oreoxmt
12-
- overvenus
13-
- qiancai
14-
- tangenta
15-
reviewers:
16-
- 3pointer
17-
- amyangfei
18-
- anotherrachel
19-
- aylei
20-
- crazycs520
21-
- dveeden
22-
- ericsyh
23-
- glkappe
24-
- GMHDBJD
25-
- Joyinqin
26-
- junlan-zhang
27-
- KanShiori
28-
- lucklove
29-
- lysu
30-
- ngaut
31-
- superlzs0476
32-
- tiancaiamao
33-
- weekface
34-
- Yisaer
35-
- zimulala
1+
# See the OWNERS docs at https://www.kubernetes.dev/docs/guide/owners/#owners
2+
# The members of 'sig-community-*' are synced from memberships defined in repository: https://github.com/pingcap/community.
3+
filters:
4+
.*:
5+
approvers:
6+
- sig-community-approvers
7+
reviewers:
8+
- sig-community-reviewers

OWNERS_ALIASES

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
# See the OWNERS docs at https://www.kubernetes.dev/docs/guide/owners/#owners_aliases
2+
# The members of 'sig-community-*' are synced from memberships defined in repository: https://github.com/pingcap/community.
13
aliases:
24
sig-develop-docs-approvers:
35
- Oreoxmt
@@ -13,3 +15,37 @@ aliases:
1315
- Mini256
1416
- wd0517
1517
- it2911
18+
sig-community-reviewers:
19+
- 3pointer
20+
- GMHDBJD
21+
- Joyinqin
22+
- KanShiori
23+
- Yisaer
24+
- amyangfei
25+
- anotherrachel
26+
- aylei
27+
- crazycs520
28+
- dveeden
29+
- ericsyh
30+
- glkappe
31+
- junlan-zhang
32+
- lucklove
33+
- lysu
34+
- ngaut
35+
- superlzs0476
36+
- tiancaiamao
37+
- weekface
38+
- zimulala
39+
sig-community-approvers:
40+
- Icemap
41+
- Oreoxmt
42+
- breezewish
43+
- csuzhangxc
44+
- hfxsd
45+
- jackysp
46+
- kissmydb
47+
- lance6716
48+
- lilin90
49+
- overvenus
50+
- qiancai
51+
- tangenta

TOC-tidb-cloud.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -839,6 +839,7 @@
839839
- [Table Filter](/table-filter.md)
840840
- [URI Formats of External Storage Services](/external-storage-uri.md)
841841
- [DDL Execution Principles and Best Practices](/ddl-introduction.md)
842+
- [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md)
842843
- [Batch Processing](/batch-processing.md)
843844
- [Troubleshoot Inconsistency Between Data and Indexes](/troubleshoot-data-inconsistency-errors.md)
844845
- [Notifications](/tidb-cloud/notifications.md)

TOC.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1088,6 +1088,7 @@
10881088
- [URI Formats of External Storage Services](/external-storage-uri.md)
10891089
- [TiDB Workload Repository](/workload-repository.md)
10901090
- [Interaction Test on Online Workloads and `ADD INDEX` Operations](/benchmark/online-workloads-and-add-index-operations.md)
1091+
- [`ANALYZE` Embedded in DDL Statements](/ddl_embedded_analyze.md)
10911092
- FAQs
10921093
- [FAQ Summary](/faq/faq-overview.md)
10931094
- [TiDB FAQs](/faq/tidb-faq.md)

br/br-pitr-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,8 @@ Restore KV Files <--------------------------------------------------------------
117117
*** ["restore log success summary"] [total-take=xxx.xx] [restore-from={TS}] [restore-to={TS}] [total-kv-count=xxx] [total-size=xxx]
118118
```
119119

120+
During data restore, the table mode of the target table is automatically set to `restore`. Tables in `restore` mode do not allow any read or write operations. After data restore is complete, the table mode automatically switches back to `normal`, and you can read and write the table normally. This mechanism ensures task stability and data consistency throughout the restore process.
121+
120122
## Clean up outdated data
121123

122124
As described in the [Usage Overview of TiDB Backup and Restore](/br/br-use-overview.md):

br/br-snapshot-guide.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,8 @@ Restore Pipeline <--------------------------------------------------------------
103103
*** ["Full Restore success summary"] [total-ranges=20] [ranges-succeed=20] [ranges-failed=0] [merge-ranges=7.546971ms] [split-region=343.594072ms] [restore-files=1.57662s] [default-CF-files=6] [write-CF-files=14] [split-keys=9] [total-take=4.344617542s] [total-kv=5] [total-kv-size=327B] [average-speed=75.27B/s] [restore-data-size(after-compressed)=4.813kB] [Size=4813] [BackupTS=435844901803917314]
104104
```
105105

106+
During data restore, the table mode of the target table is automatically set to `restore`. Tables in `restore` mode do not allow any read or write operations. After data restore is complete, the table mode automatically switches back to `normal`, and you can read and write the table normally. This mechanism ensures task stability and data consistency throughout the restore process.
107+
106108
### Restore a database or a table
107109

108110
BR supports restoring partial data of a specified database or table from backup data. This feature allows you to filter out unwanted data and back up only a specific database or table.

ddl_embedded_analyze.md

Lines changed: 177 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,177 @@
1+
---
2+
title: "`ANALYZE` Embedded in DDL Statements"
3+
summary: This document describes the `ANALYZE` feature embedded in DDL statements for newly created or reorganized indexes, which ensures that statistics for new indexes are updated promptly.
4+
---
5+
6+
# `ANALYZE` Embedded in DDL Statements <span class="version-mark">Introduced in v8.5.4 and v9.0.0</span>
7+
8+
This document describes the `ANALYZE` feature embedded in the following two types of DDL statements:
9+
10+
- DDL statements that create new indexes: [`ADD INDEX`](/sql-statements/sql-statement-add-index.md)
11+
- DDL statements that reorganize existing indexes: [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md)
12+
13+
When this feature is enabled, TiDB automatically runs an `ANALYZE` (statistics collection) operation before the new or reorganized index becomes visible to users. This prevents inaccurate optimizer estimates and potential plan changes caused by temporarily unavailable statistics after index creation or reorganization.
14+
15+
## Usage scenarios
16+
17+
In scenarios where DDL operations alternately add or modify indexes, existing stable queries might suffer from estimation bias because the new index lacks statistics, causing the optimizer to choose suboptimal plans. For more information, see [Issue #57948](https://github.com/pingcap/tidb/issues/57948).
18+
19+
For example:
20+
21+
```sql
22+
CREATE TABLE t (a INT, b INT);
23+
INSERT INTO t VALUES (1, 1), (2, 2), (3, 3);
24+
INSERT INTO t SELECT * FROM t; -- * N times
25+
26+
ALTER TABLE t ADD INDEX idx_a (a);
27+
28+
EXPLAIN SELECT * FROM t WHERE a > 4;
29+
```
30+
31+
```
32+
+-------------------------+-----------+-----------+---------------+--------------------------------+
33+
| id | estRows | task | access object | operator info |
34+
+-------------------------+-----------+-----------+---------------+--------------------------------+
35+
| TableReader_8 | 131072.00 | root | | data:Selection_7 |
36+
| └─Selection_7 | 131072.00 | cop[tikv] | | gt(test.t.a, 4) |
37+
| └─TableFullScan_6 | 393216.00 | cop[tikv] | table:t | keep order:false, stats:pseudo |
38+
+-------------------------+-----------+-----------+---------------+--------------------------------+
39+
3 rows in set (0.002 sec)
40+
```
41+
42+
In the preceding plan, because the newly created index has no statistics yet, TiDB can only rely on heuristic rules for path estimation. Unless the index access path requires no table lookup and has a significantly lower cost, the optimizer tends to choose the more stable existing path. In the preceding example, it chooses a full table scan. However, from the data distribution perspective, `t.a > 4` actually returns 0 rows. If the new index `idx_a` were used, the query could quickly locate relevant rows and avoid the full table scan. In this example, because statistics are not promptly collected after the DDL creates the index, the generated plan is not optimal, but the optimizer continues to use the original plan so query performance does not sharply regress. However, according to [Issue #57948](https://github.com/pingcap/tidb/issues/57948), in some cases heuristics might cause an unreasonable comparison between old and new indexes, pruning the index that the original plan relies on and ultimately falling back to a full table scan.
43+
44+
Starting from v8.5.0, TiDB has improved heuristic comparisons between indexes and behaviors when statistics are missing. Still, in some complex scenarios, embedding `ANALYZE` in DDL is the best way to prevent plan changes. You can control whether to run embedded `ANALYZE` during index creation or reorganization with the system variable [`tidb_stats_update_during_ddl`](/system-variables.md#tidb_stats_update_during_ddl-new-in-v854-and-v900). The default value is `OFF`.
45+
46+
## `ADD INDEX` DDL
47+
48+
When `tidb_stats_update_during_ddl` is `ON`, executing [`ADD INDEX`](/sql-statements/sql-statement-add-index.md) automatically runs an embedded `ANALYZE` operation after the Reorg phase finishes. This `ANALYZE` operation collects statistics for the newly created index before the index becomes visible to users, and then `ADD INDEX` proceeds with its remaining phases.
49+
50+
Considering that `ANALYZE` can take time, TiDB sets a timeout threshold based on the execution time of the first Reorg. If `ANALYZE` times out, `ADD INDEX` stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means the index statistics will be updated after `ANALYZE` completes asynchronously.
51+
52+
For example:
53+
54+
```sql
55+
CREATE TABLE t (a INT, b INT, c INT);
56+
Query OK, 0 rows affected (0.011 sec)
57+
58+
INSERT INTO t VALUES (1, 1, 1), (2, 2, 2), (3, 3, 3);
59+
Query OK, 3 rows affected (0.003 sec)
60+
Records: 3 Duplicates: 0 Warnings: 0
61+
62+
SET @@tidb_stats_update_during_ddl = 1;
63+
Query OK, 0 rows affected (0.001 sec)
64+
65+
ALTER TABLE t ADD INDEX idx (a, b);
66+
Query OK, 0 rows affected (0.049 sec)
67+
```
68+
69+
```sql
70+
EXPLAIN SELECT a FROM t WHERE a > 1;
71+
```
72+
73+
```
74+
+------------------------+---------+-----------+--------------------------+----------------------------------+
75+
| id | estRows | task | access object | operator info |
76+
+------------------------+---------+-----------+--------------------------+----------------------------------+
77+
| IndexReader_7 | 4.00 | root | | index:IndexRangeScan_6 |
78+
| └─IndexRangeScan_6 | 4.00 | cop[tikv] | table:t, index:idx(a, b) | range:(1,+inf], keep order:false |
79+
+------------------------+---------+-----------+--------------------------+----------------------------------+
80+
2 rows in set (0.002 sec)
81+
```
82+
83+
```sql
84+
SHOW STATS_HISTOGRAMS WHERE table_name = "t";
85+
```
86+
87+
```
88+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
89+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time | Distinct_count | Null_count | Avg_col_size | Correlation | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage |
90+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
91+
| test | t | | a | 0 | 2025-10-30 20:17:57 | 3 | 0 | 0.5 | 1 | allLoaded | 155 | 0 | 155 | 0 |
92+
| test | t | | idx | 1 | 2025-10-30 20:17:57 | 3 | 0 | 0 | 0 | allLoaded | 182 | 0 | 182 | 0 |
93+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
94+
2 rows in set (0.013 sec)
95+
```
96+
97+
```sql
98+
ADMIN SHOW DDL JOBS 1;
99+
```
100+
101+
```
102+
+--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+
103+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | COMMENTS |
104+
+--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+
105+
| 151 | test | t | add index | write reorganization | 2 | 148 | 6291456 | 2025-10-29 00:14:47.181000 | 2025-10-29 00:14:47.183000 | NULL | running | analyzing, txn-merge, max_node_count=3 |
106+
+--------+---------+--------------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+----------------------------------------+
107+
1 rows in set (0.001 sec)
108+
```
109+
110+
From the `ADD INDEX` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `ADD INDEX` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.
111+
112+
## DDL for reorganizing existing indexes
113+
114+
When `tidb_stats_update_during_ddl` is `ON`, executing [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) or [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) that reorganizes an index will also run an embedded `ANALYZE` operation after the Reorg phase completes. The mechanism is the same as for `ADD INDEX`:
115+
116+
- Start collecting statistics before the index becomes visible.
117+
- If `ANALYZE` times out, [`MODIFY COLUMN`](/sql-statements/sql-statement-modify-column.md) and [`CHANGE COLUMN`](/sql-statements/sql-statement-change-column.md) stops waiting synchronously for `ANALYZE` to finish and continues the subsequent process, making the index visible earlier to users. This means that the index statistics will be updated when `ANALYZE` finishes asynchronously.
118+
119+
For example:
120+
121+
```sql
122+
CREATE TABLE s (a VARCHAR(10), INDEX idx (a));
123+
Query OK, 0 rows affected (0.012 sec)
124+
125+
INSERT INTO s VALUES (1), (2), (3);
126+
Query OK, 3 rows affected (0.003 sec)
127+
Records: 3 Duplicates: 0 Warnings: 0
128+
129+
SET @@tidb_stats_update_during_ddl = 1;
130+
Query OK, 0 rows affected (0.001 sec)
131+
132+
ALTER TABLE s MODIFY COLUMN a INT;
133+
Query OK, 0 rows affected (0.056 sec)
134+
135+
EXPLAIN SELECT * FROM s WHERE a > 1;
136+
```
137+
138+
```
139+
+------------------------+---------+-----------+-----------------------+----------------------------------+
140+
| id | estRows | task | access object | operator info |
141+
+------------------------+---------+-----------+-----------------------+----------------------------------+
142+
| IndexReader_7 | 2.00 | root | | index:IndexRangeScan_6 |
143+
| └─IndexRangeScan_6 | 2.00 | cop[tikv] | table:s, index:idx(a) | range:(1,+inf], keep order:false |
144+
+------------------------+---------+-----------+-----------------------+----------------------------------+
145+
2 rows in set (0.005 sec)
146+
```
147+
148+
```sql
149+
SHOW STATS_HISTOGRAMS WHERE table_name = "s";
150+
```
151+
152+
```
153+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
154+
| Db_name | Table_name | Partition_name | Column_name | Is_index | Update_time | Distinct_count | Null_count | Avg_col_size | Correlation | Load_status | Total_mem_usage | Hist_mem_usage | Topn_mem_usage | Cms_mem_usage |
155+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
156+
| test | s | | a | 0 | 2025-10-30 20:10:18 | 3 | 0 | 2 | 1 | allLoaded | 158 | 0 | 158 | 0 |
157+
| test | s | | a | 0 | 2025-10-30 20:10:18 | 3 | 0 | 1 | 1 | allLoaded | 155 | 0 | 155 | 0 |
158+
| test | s | | idx | 1 | 2025-10-30 20:10:18 | 3 | 0 | 0 | 0 | allLoaded | 158 | 0 | 158 | 0 |
159+
| test | s | | idx | 1 | 2025-10-30 20:10:18 | 3 | 0 | 0 | 0 | allLoaded | 155 | 0 | 155 | 0 |
160+
+---------+------------+----------------+-------------+----------+---------------------+----------------+------------+--------------+-------------+-------------+-----------------+----------------+----------------+---------------+
161+
4 rows in set (0.008 sec)
162+
```
163+
164+
```sql
165+
ADMIN SHOW DDL JOBS 1;
166+
```
167+
168+
```
169+
+--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+
170+
| JOB_ID | DB_NAME | TABLE_NAME | JOB_TYPE | SCHEMA_STATE | SCHEMA_ID | TABLE_ID | ROW_COUNT | CREATE_TIME | START_TIME | END_TIME | STATE | COMMENTS |
171+
+--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+
172+
| 153 | test | s | modify column | write reorganization | 2 | 148 | 12582912 | 2025-10-29 00:26:49.240000 | 2025-10-29 00:26:49.244000 | NULL | running | analyzing |
173+
+--------+---------+------------------+---------------+----------------------+-----------+----------+-----------+----------------------------+----------------------------+----------------------------+---------+-----------------------------+
174+
1 rows in set (0.001 sec)
175+
```
176+
177+
From the `MODIFY COLUMN` example, when `tidb_stats_update_during_ddl` is `ON`, you can see that after the execution of the `MODIFY COLUMN` DDL statement, the subsequent `EXPLAIN` output shows that statistics for the index `idx` have been automatically collected and loaded into memory (you can verify it by executing `SHOW STATS_HISTOGRAMS`). As a result, the optimizer can immediately use these statistics for range scans. If index creation or reorganization and `ANALYZE` take a long time, you can check the DDL job status by executing `ADMIN SHOW DDL JOBS`. When the `COMMENTS` column in the output contains `analyzing`, it means that the DDL job is collecting statistics.

0 commit comments

Comments
 (0)