Skip to content

Commit 426a48e

Browse files
authored
DOC-12277 Add contention examples using crdb_internal.transaction_contention_events (#19377)
* Added monitor-and-analyze-transaction-contention.md with images. * In crdb-internal.md, moved the column table to include file transaction-contention-events-columns.md. * In optimize-performance.json, added link to monitor-and-analyze-transaction-contention.html. * Incorporated Jon St. John’s feedback. * Incorporated DavidH and Xin’s comments from slack. * Incorporated suggestions from docs-reviewer-gpt. * Incorporated Kevin’s feedback. Copied v25.1 changes to v25.2. * Incorporated Rich’s feedback part 1. * Incorporated Rich’s feedback part 2. Copied v25.1 changes to v25.2.
1 parent f100ca0 commit 426a48e

27 files changed

+1253
-20
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{{site.data.alerts.callout_danger}}
2+
Not all `crdb_internal` tables are production-ready. Consult the [`crdb_internal`]({% link {{ page.version.version }}/crdb-internal.md %}#tables) page for their current status.
3+
{{site.data.alerts.end}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{{site.data.alerts.callout_danger}}
2+
Querying the `crdb_internal.cluster_locks` table triggers an RPC fan-out to all nodes in the cluster, which can make it a relatively expensive operation.
3+
{{site.data.alerts.end}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{{site.data.alerts.callout_danger}}
2+
Querying the `crdb_internal.transaction_contention_events` table triggers an expensive RPC fan-out to all nodes, making it a resource-intensive operation. Avoid frequent polling and do not use this table for continuous monitoring.
3+
{{site.data.alerts.end}}

src/current/_includes/v25.1/essential-metrics.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
130130
| <a id="sql-service-latency"></a>sql.service.latency-p90, sql.service.latency-p99 | sql.service.latency | Latency of SQL request execution | These high-level metrics reflect workload performance. Monitor these metrics to understand latency over time. If abnormal patterns emerge, apply the metric's time range to the [**SQL Activity** pages]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#sql-activity-pages) to investigate interesting outliers or patterns. The [**Statements page**]({% link {{ page.version.version }}/ui-statements-page.md %}) has P90 Latency and P99 latency columns to enable correlation with this metric. |
131131
| sql.txn.latency-p90, sql.txn.latency-p99 | sql.txn.latency | Latency of SQL transactions | These high-level metrics provide a latency histogram of all executed SQL transactions. These metrics provide an overview of the current SQL workload. |
132132
| txnwaitqueue.deadlocks_total | {% if include.deployment == 'self-hosted' %}txnwaitqueue.deadlocks.count |{% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Number of deadlocks detected by the transaction wait queue | Alert on this metric if its value is greater than zero, especially if transaction throughput is lower than expected. Applications should be able to detect and recover from deadlock errors. However, transaction performance and throughput can be maximized if the application logic avoids deadlock conditions in the first place, for example, by keeping transactions as short as possible. |
133-
| sql.distsql.contended_queries.count | {% if include.deployment == 'self-hosted' %}sql.distsql.contended.queries |{% elsif include.deployment == 'advanced' %} sql.distsql.contended.queries |{% endif %} Number of SQL queries that experienced contention | This metric is incremented whenever there is a non-trivial amount of contention experienced by a statement whether read-write or write-write conflicts. Monitor this metric to correlate possible workload performance issues to contention conflicts. |
133+
| <a id="sql-distsql-contended-queries-count"></a>sql.distsql.contended_queries.count | {% if include.deployment == 'self-hosted' %}sql.distsql.contended.queries |{% elsif include.deployment == 'advanced' %} sql.distsql.contended.queries |{% endif %} Number of SQL queries that experienced contention | This metric is incremented whenever there is a non-trivial amount of contention experienced by a statement whether read-write or write-write conflicts. Monitor this metric to correlate possible workload performance issues to contention conflicts. |
134134
| <a id="sql-conn-failures"></a>sql.conn.failures | sql.conn.failures.count | Number of SQL connection failures | This metric is incremented whenever a connection attempt fails for any reason, including timeouts. |
135135
| <a id="sql-conn-latency"></a>sql.conn.latency-p90, sql.conn.latency-p99 | sql.conn.latency | Latency to establish and authenticate a SQL connection | These metrics characterize the database connection latency which can affect the application performance, for example, by having slow startup times. Connection failures are not recorded in these metrics.|
136136
| txn.restarts.serializable | txn.restarts.serializable | Number of restarts due to a forwarded commit timestamp and isolation=SERIALIZABLE | This metric is one measure of the impact of contention conflicts on workload performance. For guidance on contention conflicts, review [transaction contention best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) and [performance tuning recipes]({% link {{ page.version.version }}/performance-recipes.md %}#transaction-contention). Tens of restarts per minute may be a high value, a signal of an elevated degree of contention in the workload, which should be investigated. |

src/current/_includes/v25.1/sidebar-data/optimize-performance.json

+6
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,12 @@
5757
"/${VERSION}/admission-control.html"
5858
]
5959
},
60+
{
61+
"title": "Monitor and Analyze Transaction Contention",
62+
"urls": [
63+
"/${VERSION}/monitor-and-analyze-transaction-contention.html"
64+
]
65+
},
6066
{
6167
"title": "Performance Tuning Recipes",
6268
"urls": [
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Column | Type | Description
2+
-------|------|------------
3+
`collection_ts` | `TIMESTAMPTZ NOT NULL` | The timestamp when the transaction [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) event was collected.
4+
`blocking_txn_id` | `UUID NOT NULL` | The ID of the blocking transaction. You can join this column into the [`cluster_contention_events`]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_contention_events) table.
5+
`blocking_txn_fingerprint_id` | `BYTES NOT NULL`| The ID of the blocking transaction fingerprint. To surface historical information about the transactions that caused the [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention), you can join this column into the [`statement_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#statement_statistics) and [`transaction_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#transaction_statistics) tables.
6+
`waiting_txn_id` | `UUID NOT NULL` | The ID of the waiting transaction. You can join this column into the [`cluster_contention_events`]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_contention_events) table.
7+
`waiting_txn_fingerprint_id` | `BYTES NOT NULL` | The ID of the waiting transaction fingerprint. To surface historical information about the transactions that caused the [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention), you can join this column into the [`statement_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#statement_statistics) and [`transaction_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#transaction_statistics) tables.
8+
`waiting_stmt_id` | `STRING NOT NULL` | The statement id of the transaction that was waiting (unique for each statement execution).
9+
`waiting_stmt_fingerprint_id` | `BYTES NOT NULL` | The ID of the waiting statement fingerprint. To surface historical information about the statements that caused the [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention), you can join this column into the [`statement_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#statement_statistics) table.
10+
`contention_duration` | `INTERVAL NOT NULL` | The interval of time the waiting transaction spent waiting for the blocking transaction.
11+
`contending_key` | `BYTES NOT NULL` | The key on which the transactions contended.
12+
`contending_pretty_key` | `STRING NOT NULL` | The specific key that was involved in the contention event, in readable format.
13+
`database_name` | `STRING NOT NULL` | The database where the contention occurred.
14+
`schema_name` | `STRING NOT NULL` | The schema where the contention occurred.
15+
`table_name` | `STRING NOT NULL` | The table where the contention occurred.
16+
`index_name` | `STRING NULL` | The index where the contention occurred.
17+
`contention_type` | `STRING NOT NULL` | The type of contention. Possible values:<ul><li>`LOCK_WAIT`: Indicates that the transaction waited on a specific key. The record includes the key and the wait duration.</li><li>`SERIALIZATION_CONFLICT`: Represents a serialization conflict specific to a transaction execution. This is recorded only when a [client-side retry error]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}) containing the conflicting transaction details is emitted.</li></ul>After recording, the `contention_type` is not modified. A transaction may have multiple `LOCK_WAIT` events, as they correspond to specific keys, but only one `SERIALIZATION_CONFLICT` event.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{{site.data.alerts.callout_danger}}
2+
Querying the `crdb_internal.cluster_locks` table triggers an RPC fan-out to all nodes in the cluster, which can make it a relatively expensive operation.
3+
{{site.data.alerts.end}}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{{site.data.alerts.callout_danger}}
2+
Querying the `crdb_internal.transaction_contention_events` table triggers an expensive RPC fan-out to all nodes, making it a resource-intensive operation. Avoid frequent polling and do not use this table for continuous monitoring.
3+
{{site.data.alerts.end}}

src/current/_includes/v25.2/essential-metrics.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ The **Usage** column explains why each metric is important to visualize in a cus
130130
| <a id="sql-service-latency"></a>sql.service.latency-p90, sql.service.latency-p99 | sql.service.latency | Latency of SQL request execution | These high-level metrics reflect workload performance. Monitor these metrics to understand latency over time. If abnormal patterns emerge, apply the metric's time range to the [**SQL Activity** pages]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#sql-activity-pages) to investigate interesting outliers or patterns. The [**Statements page**]({% link {{ page.version.version }}/ui-statements-page.md %}) has P90 Latency and P99 latency columns to enable correlation with this metric. |
131131
| sql.txn.latency-p90, sql.txn.latency-p99 | sql.txn.latency | Latency of SQL transactions | These high-level metrics provide a latency histogram of all executed SQL transactions. These metrics provide an overview of the current SQL workload. |
132132
| txnwaitqueue.deadlocks_total | {% if include.deployment == 'self-hosted' %}txnwaitqueue.deadlocks.count |{% elsif include.deployment == 'advanced' %}NOT AVAILABLE |{% endif %} Number of deadlocks detected by the transaction wait queue | Alert on this metric if its value is greater than zero, especially if transaction throughput is lower than expected. Applications should be able to detect and recover from deadlock errors. However, transaction performance and throughput can be maximized if the application logic avoids deadlock conditions in the first place, for example, by keeping transactions as short as possible. |
133-
| sql.distsql.contended_queries.count | {% if include.deployment == 'self-hosted' %}sql.distsql.contended.queries |{% elsif include.deployment == 'advanced' %} sql.distsql.contended.queries |{% endif %} Number of SQL queries that experienced contention | This metric is incremented whenever there is a non-trivial amount of contention experienced by a statement whether read-write or write-write conflicts. Monitor this metric to correlate possible workload performance issues to contention conflicts. |
133+
| <a id="sql-distsql-contended-queries-count"></a>sql.distsql.contended_queries.count | {% if include.deployment == 'self-hosted' %}sql.distsql.contended.queries |{% elsif include.deployment == 'advanced' %} sql.distsql.contended.queries |{% endif %} Number of SQL queries that experienced contention | This metric is incremented whenever there is a non-trivial amount of contention experienced by a statement whether read-write or write-write conflicts. Monitor this metric to correlate possible workload performance issues to contention conflicts. |
134134
| <a id="sql-conn-failures"></a>sql.conn.failures | sql.conn.failures.count | Number of SQL connection failures | This metric is incremented whenever a connection attempt fails for any reason, including timeouts. |
135135
| <a id="sql-conn-latency"></a>sql.conn.latency-p90, sql.conn.latency-p99 | sql.conn.latency | Latency to establish and authenticate a SQL connection | These metrics characterize the database connection latency which can affect the application performance, for example, by having slow startup times. Connection failures are not recorded in these metrics.|
136136
| txn.restarts.serializable | txn.restarts.serializable | Number of restarts due to a forwarded commit timestamp and isolation=SERIALIZABLE | This metric is one measure of the impact of contention conflicts on workload performance. For guidance on contention conflicts, review [transaction contention best practices]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) and [performance tuning recipes]({% link {{ page.version.version }}/performance-recipes.md %}#transaction-contention). Tens of restarts per minute may be a high value, a signal of an elevated degree of contention in the workload, which should be investigated. |

src/current/_includes/v25.2/sidebar-data/optimize-performance.json

+6
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,12 @@
5757
"/${VERSION}/admission-control.html"
5858
]
5959
},
60+
{
61+
"title": "Monitor and Analyze Transaction Contention",
62+
"urls": [
63+
"/${VERSION}/monitor-and-analyze-transaction-contention.html"
64+
]
65+
},
6066
{
6167
"title": "Performance Tuning Recipes",
6268
"urls": [
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
Column | Type | Description
2+
-------|------|------------
3+
`collection_ts` | `TIMESTAMPTZ NOT NULL` | The timestamp when the transaction [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention) event was collected.
4+
`blocking_txn_id` | `UUID NOT NULL` | The ID of the blocking transaction. You can join this column into the [`cluster_contention_events`]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_contention_events) table.
5+
`blocking_txn_fingerprint_id` | `BYTES NOT NULL`| The ID of the blocking transaction fingerprint. To surface historical information about the transactions that caused the [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention), you can join this column into the [`statement_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#statement_statistics) and [`transaction_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#transaction_statistics) tables.
6+
`waiting_txn_id` | `UUID NOT NULL` | The ID of the waiting transaction. You can join this column into the [`cluster_contention_events`]({% link {{ page.version.version }}/crdb-internal.md %}#cluster_contention_events) table.
7+
`waiting_txn_fingerprint_id` | `BYTES NOT NULL` | The ID of the waiting transaction fingerprint. To surface historical information about the transactions that caused the [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention), you can join this column into the [`statement_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#statement_statistics) and [`transaction_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#transaction_statistics) tables.
8+
`waiting_stmt_id` | `STRING NOT NULL` | The statement id of the transaction that was waiting (unique for each statement execution).
9+
`waiting_stmt_fingerprint_id` | `BYTES NOT NULL` | The ID of the waiting statement fingerprint. To surface historical information about the statements that caused the [contention]({% link {{ page.version.version }}/performance-best-practices-overview.md %}#transaction-contention), you can join this column into the [`statement_statistics`]({% link {{ page.version.version }}/crdb-internal.md %}#statement_statistics) table.
10+
`contention_duration` | `INTERVAL NOT NULL` | The interval of time the waiting transaction spent waiting for the blocking transaction.
11+
`contending_key` | `BYTES NOT NULL` | The key on which the transactions contended.
12+
`contending_pretty_key` | `STRING NOT NULL` | The specific key that was involved in the contention event, in readable format.
13+
`database_name` | `STRING NOT NULL` | The database where the contention occurred.
14+
`schema_name` | `STRING NOT NULL` | The schema where the contention occurred.
15+
`table_name` | `STRING NOT NULL` | The table where the contention occurred.
16+
`index_name` | `STRING NULL` | The index where the contention occurred.
17+
`contention_type` | `STRING NOT NULL` | The type of contention. Possible values:<ul><li>`LOCK_WAIT`: Indicates that the transaction waited on a specific key. The record includes the key and the wait duration.</li><li>`SERIALIZATION_CONFLICT`: Represents a serialization conflict specific to a transaction execution. This is recorded only when a [client-side retry error]({% link {{ page.version.version }}/transaction-retry-error-reference.md %}) containing the conflicting transaction details is emitted.</li></ul>After recording, the `contention_type` is not modified. A transaction may have multiple `LOCK_WAIT` events, as they correspond to specific keys, but only one `SERIALIZATION_CONFLICT` event.
40 KB
Loading
105 KB
Loading
111 KB
Loading
105 KB
Loading
130 KB
Loading
202 KB
Loading
40 KB
Loading
105 KB
Loading
111 KB
Loading
105 KB
Loading
130 KB
Loading
202 KB
Loading

0 commit comments

Comments
 (0)