Skip to content

Commit d460824

Browse files
committed
Add zone config troubleshooting guide v1
Fixes DOC-9210 Summary of changes: - Add a new page, 'Troubleshoot Replication Zones', to _The ZoneConfigonomicon (tm)_ - Update the 'Replication controls' page with more detailed info re: zone config inheritance hierarchy and behavior - Fix incorrect statements on the `ALTER RANGE` page since they're needed to map from range IDs returned by the critical nodes endpoint (mentioned in 'Troubleshoot Replication Zones') to actual schema objects - Add moar links (tm) from various zone config-related pages to the new troubleshooting guide and amongst themselves - Add a note to various zone config-related docs saying "most users should not do manual zone config changes, see Multi-region SQL and Zone Config Extensions instead"
1 parent 9c326ff commit d460824

32 files changed

+545
-50
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
[Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster]({% link {{ page.version.version }}/backup.md %}#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
For instructions showing how to troubleshoot replication zones, see [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).

Diff for: src/current/_includes/v24.3/sidebar-data/troubleshooting.json

+6
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,12 @@
5656
"/${VERSION}/query-replication-reports.html"
5757
]
5858
},
59+
{
60+
"title": "Troubleshoot Replication Zones",
61+
"urls": [
62+
"/${VERSION}/troubleshoot-replication-zones.html"
63+
]
64+
},
5965
{
6066
"title": "Benchmarking",
6167
"items": [
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
Cockroach Labs {% if page.name != "configure-replication-zones.md" %} [does not recommend modifying zone configurations manually]({% link {{ page.version.version }}/configure-replication-zones.md %}#why-manual-zone-config-management-is-not-recommended) {% else %} [does not recommend modifying zone configurations manually](#why-manual-zone-config-management-is-not-recommended) {% endif %}.
2+
3+
Most users should use [Multi-region SQL statements]({% link {{ page.version.version }}/multiregion-overview.md %}) instead; if additional control is needed, [Zone config extensions]({% link {{ page.version.version }}/zone-config-extensions.md %}) can be used to augment the multi-region SQL statements.
83.4 KB
Loading

Diff for: src/current/v24.3/alter-database.md

+19
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,10 @@ For usage, see [Synopsis](#synopsis).
169169
If you directly change a database's zone configuration with `ALTER DATABASE ... CONFIGURE ZONE`, CockroachDB will block all [`ALTER DATABASE ... SET PRIMARY REGION`](#set-primary-region) statements on the database.
170170
{{site.data.alerts.end}}
171171

172+
{{site.data.alerts.callout_danger}}
173+
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
174+
{{site.data.alerts.end}}
175+
172176
You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
173177

174178
For examples, see [Replication Controls](#configure-replication-zones).
@@ -689,6 +693,10 @@ HINT: you must first drop super region usa before you can drop the region us-wes
689693

690694
### Configure replication zones
691695

696+
{{site.data.alerts.callout_danger}}
697+
{% include {{ page.version.version }}/zone-configs/avoid-manual-zone-configs.md %}
698+
{{site.data.alerts.end}}
699+
692700
{% include {{ page.version.version }}/sql/movr-statements-geo-partitioned-replicas.md %}
693701

694702
#### Create a replication zone for a database
@@ -715,6 +723,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
715723
ALTER DATABASE movr CONFIGURE ZONE DISCARD;
716724
~~~
717725

726+
### Troubleshoot replication zones
727+
728+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
729+
718730
### Use Zone Config Extensions
719731

720732
The following examples show:
@@ -1078,6 +1090,12 @@ When you discard a zone configuration, the objects it was applied to will then i
10781090
However, this statement will not remove any configuration created by the [multi-region abstractions]({% link {{ page.version.version }}/multiregion-overview.md %}).
10791091
{{site.data.alerts.end}}
10801092

1093+
#### Troubleshoot Zone Config Extensions
1094+
1095+
The process for troubleshooting Zone Config Extensions is the same as troubleshooting any other changes to zone configs.
1096+
1097+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
1098+
10811099
### Change database owner
10821100

10831101
{% include {{page.version.version}}/sql/movr-statements.md %}
@@ -1283,3 +1301,4 @@ For more information about the region survival goal, see [Surviving region failu
12831301
- [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %})
12841302
- [Online Schema Changes]({% link {{ page.version.version }}/online-schema-changes.md %})
12851303
- [SQL Statements]({% link {{ page.version.version }}/sql-statements.md %})
1304+
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})

Diff for: src/current/v24.3/alter-index.md

+6-2
Original file line numberDiff line numberDiff line change
@@ -47,12 +47,12 @@ Subcommand | Description |
4747

4848
`ALTER INDEX ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for an index. To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).
4949

50-
51-
5250
You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
5351

5452
For examples, see [Replication Controls](#configure-replication-zones).
5553

54+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
55+
5656
#### Required privileges
5757

5858
The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
@@ -225,6 +225,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
225225
ALTER INDEX vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE DISCARD;
226226
~~~
227227

228+
#### Troubleshoot replication zones
229+
230+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
231+
228232
### Define partitions
229233

230234
#### Define a list partition on an index

Diff for: src/current/v24.3/alter-partition.md

+10
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ docs_area: reference.sql
99

1010
To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).
1111

12+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
13+
1214
You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
1315

1416

@@ -44,3 +46,11 @@ The user must have the [`CREATE`]({% link {{ page.version.version }}/grant.md %}
4446
### Create a replication zone for a partition
4547

4648
{% include {{ page.version.version }}/zone-configs/create-a-replication-zone-for-a-table-partition.md hide-enterprise-warning="true" %}
49+
50+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
51+
52+
## See also
53+
54+
- [Table partitioning]({% link {{page.version.version}}/partitioning.md %})
55+
- [`SHOW PARTITIONS`]({% link {{page.version.version}}/show-partitions.md %})
56+
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})

Diff for: src/current/v24.3/alter-range.md

+9-7
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,11 @@ Additional parameters are documented for the respective [subcommands](#subcomman
3434

3535
### `CONFIGURE ZONE`
3636

37-
`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).
37+
`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove [replication zones]({% link {{ page.version.version }}/configure-replication-zones.md %}) for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).
3838

39-
You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
39+
You can use replication zones to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
40+
41+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
4042

4143
#### Required privileges
4244

@@ -121,7 +123,7 @@ For example, to get all range IDs, leaseholder store IDs, and leaseholder locali
121123

122124
{% include_cached copy-clipboard.html %}
123125
~~~ sql
124-
WITH user_info AS (SHOW RANGES FROM TABLE users) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
126+
WITH user_info AS (SHOW RANGES FROM TABLE users WITH DETAILS) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
125127
~~~
126128

127129
~~~
@@ -163,7 +165,7 @@ To move the leases for all data in the [`movr.users`]({% link {{ page.version.ve
163165

164166
{% include_cached copy-clipboard.html %}
165167
~~~ sql
166-
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users'
168+
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
167169
~~~
168170

169171
~~~
@@ -205,7 +207,7 @@ To move the replicas for all data in the [`movr.users`]({% link {{ page.version.
205207

206208
{% include_cached copy-clipboard.html %}
207209
~~~ sql
208-
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
210+
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
209211
~~~
210212

211213
~~~
@@ -231,7 +233,7 @@ To move all of a range's voting replicas from one store to another store:
231233

232234
{% include_cached copy-clipboard.html %}
233235
~~~ sql
234-
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
236+
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
235237
~~~
236238

237239
~~~
@@ -261,7 +263,7 @@ This statement will only have an effect on clusters that have non-voting replica
261263

262264
{% include_cached copy-clipboard.html %}
263265
~~~ sql
264-
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
266+
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
265267
~~~
266268

267269
~~~

Diff for: src/current/v24.3/alter-table.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,8 @@ You can use *replication zones* to control the number and location of replicas f
223223

224224
For examples, see [Replication Controls](#configure-replication-zones).
225225

226+
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}
227+
226228
#### Required privileges
227229

228230
The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
@@ -358,8 +360,6 @@ For usage, see [Synopsis](#synopsis).
358360

359361
`ALTER TABLE ... PARTITION BY` is used to partition, re-partition, or un-partition a table. After defining partitions, [`CONFIGURE ZONE`](#configure-zone) is used to control the replication and placement of partitions.
360362

361-
362-
363363
For examples, see [Define partitions](#define-partitions).
364364

365365
#### Parameters

Diff for: src/current/v24.3/backup.md

+2-1
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,10 @@ To view the contents of an backup created with the `BACKUP` statement, use [`SHO
3333
## Considerations
3434

3535
- [Full cluster backups](#back-up-a-cluster) include [license keys]({% link {{ page.version.version }}/licensing-faqs.md %}#set-a-license). When you [restore]({% link {{ page.version.version }}/restore.md %}) a full cluster backup that includes a license, the license is also restored.
36-
- [Zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}) present on the destination cluster prior to a restore will be **overwritten** during a [cluster restore]({% link {{ page.version.version }}/restore.md %}#full-cluster) with the zone configurations from the [backed up cluster](#back-up-a-cluster). If there were no customized zone configurations on the cluster when the backup was taken, then after the restore the destination cluster will use the zone configuration from the [`RANGE DEFAULT` configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}#view-the-default-replication-zone).
3736
- You cannot restore a backup of a multi-region database into a single-region database.
3837
- Exclude a table's row data from a backup using the [`exclude_data_from_backup`]({% link {{ page.version.version }}/take-full-and-incremental-backups.md %}#exclude-a-tables-data-from-backups) parameter.
3938
- `BACKUP` is a blocking statement. To run a backup job asynchronously, use the `DETACHED` option. See the [options](#options) below.
39+
- {% include {{ page.version.version }}/backups/zone-configs-overwritten-during-restore.md %}
4040

4141
### Storage considerations
4242

@@ -378,3 +378,4 @@ To use an external connection URI to back up to cloud storage with an associated
378378
- [`CREATE SCHEDULE FOR BACKUP`]({% link {{ page.version.version }}/create-schedule-for-backup.md %})
379379
- [`RESTORE`]({% link {{ page.version.version }}/restore.md %})
380380
- [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %})
381+
- [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})

Diff for: src/current/v24.3/cluster-api.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Endpoint | Name | Description | Support
2121
[`/databases/{database}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseDetails) | Get database details | Get the descriptor ID of a specified database. | Stable
2222
[`/databases/{database}/grants`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseGrants) | List database grants | List all [privileges](security-reference/authorization.html#managing-privileges) granted to users for a specified database. | Stable
2323
[`/databases/{database}/tables`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseTables) | List database tables | List all tables in a specified database. | Stable
24-
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and zone configuration. | Stable
24+
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}). | Stable
2525
[`/events`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listEvents) | List events | List the latest [events](eventlog.html) on the cluster, in descending order. | Unstable
2626
[`/health`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/health) | Check node health | Determine if the node is running and ready to accept SQL connections. | Stable
2727
[`/nodes`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listNodes) | List nodes | Get details on all nodes in the cluster, including node IDs, software versions, and hardware. | Stable

Diff for: src/current/v24.3/cluster-setup-troubleshooting.md

+12-12
Original file line numberDiff line numberDiff line change
@@ -587,6 +587,18 @@ If you still see under-replicated/unavailable ranges on the Cluster Overview pag
587587
1. To view the **Range Report** for a range, click on the range number in the **Under-replicated (or slow)** table or **Unavailable** table.
588588
1. On the Range Report page, scroll down to the **Simulated Allocator Output** section. The table contains an error message which explains the reason for the under-replicated range. Follow the guidance in the message to resolve the issue. If you need help understanding the error or the guidance, [file an issue]({% link {{ page.version.version }}/file-an-issue.md %}). Please be sure to include the full Range Report and error message when you submit the issue.
589589
590+
#### Check for under-replicated or unavailable data
591+
592+
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Critical nodes endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint).
593+
594+
#### Check for replication zone constraint violations
595+
596+
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Troubleshoot Replication Zones]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
597+
598+
#### Check for critical localities
599+
600+
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in the [Critical nodes endpoint documentation]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
601+
590602
## Node liveness issues
591603
592604
"Node liveness" refers to whether a node in your cluster has been determined to be "dead" or "alive" by the rest of the cluster. This is achieved using checks that ensure that each node connected to the cluster is updating its liveness record. This information is shared with the rest of the cluster using an internal gossip protocol.
@@ -633,18 +645,6 @@ If your cluster is in a partially-available state due to a recent node or networ
633645
634646
Even with `server.eventlog.enabled` set to `false`, notable log events are still sent to configured [log sinks]({% link {{ page.version.version }}/configure-logs.md %}#configure-log-sinks) as usual.
635647
636-
## Check for under-replicated or unavailable data
637-
638-
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
639-
640-
## Check for replication zone constraint violations
641-
642-
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
643-
644-
## Check for critical localities
645-
646-
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
647-
648648
## Something else?
649649
650650
If we do not have a solution here, you can try using our other [support resources]({% link {{ page.version.version }}/support-resources.md %}), including:

0 commit comments

Comments
 (0)