-
Notifications
You must be signed in to change notification settings - Fork 52
Azure SQL Database tips
This wiki page describes get-sqldb-tips.sql, a T-SQL script that produces tips to improve database design, health, and performance in Azure SQL Database. Tips are based on well-established best practices for the MSSQL database engine, with particular focus on Azure SQL Database specifics.
The script works on all Azure SQL Database service tiers, including Basic, Standard, Premium, General Purpose, Business Critical, and Hyperscale. Both single databases and databases in elastic pools are supported.
The script checks over 40 conditions to produce tips. New tips are added based on our customer engagements and community feedback.
After reading this wiki page, see FAQ for answers to common questions. For changes in each release, see Change log.
Download the latest release of the script. Unzip, and open the get-sqldb-tips.sql
file in a query window in SQL Server Management Studio (SSMS), Azure Data Studio (ADS), Query editor in Azure portal, or any MSSQL client tool of your choice. Connect the query window to your target database and execute the script. If your database uses compatibility level 100, use get-sqldb-tips-compat-level-100-only.sql
file.
The script usually executes in about a minute or less. It may take longer on resource-constrained service objectives such as Basic, S0-S2, or when CPU utilization is high.
The result set has the following columns:
-
tip_id
: Unique numeric identifier for a tip, included in tip descriptions on this page. -
description
: Short description of a tip. -
confidence_percent
: Our degree of confidence that following the tip is worthwhile and will result in a net benefit. Tips with lower confidence may require additional analysis and testing in the context of a specific application and workload. In the right circumstances, even tips with a relatively low confidence can produce substantial benefits. This confidence value applies to the tip definition and does not change from one execution of the script to the next. -
additional_info_url
: Pasting this URL into a browser window opens a detailed tip description on this wiki page. Descriptions include recommended actions, sample commands and queries, and an explanation for the data in thedetails
column. There are also links to additional resources (documentation, blogs, etc.) to provide background information on the topic. -
details
: This provides additional data to help you act on a tip. For example, this can be the list of top queries, or the list of tables/indexes that are data compression candidates. In SSMS and ADS, clicking on this column displays the data in a new window for easier reading. These details can change from one execution of the script to the next, as database state changes.
To return tips in a single JSON document instead of the default tabular format, set the @JSONOutput
variable to 1. This is useful if you want to send script results to others in a portable format, or if you are using Query editor in Azure portal and want to see the entire details
column.
- For each tip in the result, make sure to read the description on this wiki page using the link in
additional_info_url
, especially if the data in thedetails
column in not immediately clear. - Execute the script while a representative workload is running. This could mean running the script on a production database, or a representative copy of a production database. Without a representative workload running, or with a smaller (e.g. non-production) database, relevant tips may be missed, or less accurate tips may be generated. If workload type and intensity change over time, consider executing the script at different points in time to get all relevant tips.
- You can use the script as a troubleshooting tool, executing it when a performance or stability problem is occurring. This has higher chances of producing specific tips to help you fix the problem.
- Some tips are based on system metadata accumulated since database engine startup. In those cases, the
details
column includes engine startup time. If the time passed since engine startup is short, consider rerunning the script when more time has passed, for the tip to be based on more detailed and representative system metadata. - This script can be executed on either primary or readable secondary replica of a database, including read scale-out replicas and geo-replicas. Some tips will only be produced on either primary or secondary replica. When the database has any readable secondary replicas, we recommend executing the script on the primary and all secondary replicas.
In this example, there are seven tips returned by the script. For most tips, there is additional information in the details
column.
The security principal executing the script must be either:
- A server administrator using SQL authentication or Azure Active Directory (AAD) authentication;
- A server login with an associated database user in the target database. The server login must be a member of
##MS_ServerStateReader##
server role. Additionally, to evaluate certain tips, the VIEW DEFINITION permission on the database is required.
Here is a code sample to grant the necessary and sufficient access to execute the script and evaluate all tips to a login named tips
:
-- In the master database on an Azure SQL Database logical server
CREATE LOGIN tips WITH PASSWORD = 'strong-password-here';
ALTER SERVER ROLE [##MS_ServerStateReader##] ADD MEMBER tips;
-- In the user database where tips are to be produced, on the same logical server
CREATE USER tips FOR LOGIN tips;
GRANT VIEW DEFINITION TO tips;
This code grants permissions equivalent to a union of VIEW SERVER STATE
and VIEW DEFINITION
, which is required to query system metadata views used in the script. The ##MS_ServerStateReader##
server role does not grant any additional permissions, specifically no permissions to modify database schema or to read data.
The above permissions are sufficient to execute the script on any service tier. This is the recommended way to run the script because it ensures that all potential tips are evaluated.
On single databases in Standard (S2 and higher), General Purpose, Premium, Business Critical, and Hyperscale service tiers, for nearly all tips it is sufficient to only hold the VIEW DATABASE STATE
and VIEW DEFINITION
permissions on the database. That can be granted as follows:
-- In the master database on an Azure SQL Database logical server
CREATE LOGIN tips WITH PASSWORD = 'strong-password-here';
-- In the user database where tips are to be produced, on the same logical server
CREATE USER tips FOR LOGIN tips;
GRANT VIEW DATABASE STATE, VIEW DEFINITION TO tips;
Alternatively, using database-scoped authentication, i.e. without creating a server login in the master
database:
-- In the user database where tips are to be produced
-- Using SQL authentication
CREATE USER tips WITH PASSWORD = 'strong-password-here';
GRANT VIEW DATABASE STATE, VIEW DEFINITION TO tips;
-- Or, using Azure Active Directory (AAD) authentication
CREATE USER [[email protected]] FROM EXTERNAL PROVIDER;
GRANT VIEW DATABASE STATE, VIEW DEFINITION TO [[email protected]];
Note that for Basic, S0, and S1 single databases, and for any database in an elastic pool, VIEW DATABASE STATE
and VIEW DEFINITION
are sufficient to produce only a few tips, and higher permissions described in the previous section are recommended.
If tips were skipped because of insufficient permissions, they will be listed in the second result set that describes skipped tips.
By default, all tips are evaluated. If you prefer to skip some tips, you can easily edit tip metadata in the script to not run the corresponding queries.
To do that, search the script for the Define all tips
comment to find the tip definition code. In the VALUES
clause, change the first column value for the tips you want to exclude from 1
to 0
. In the following example, tip 1310
(Last partitions...) will be evaluated, while tip 1320
(Top queries...) will be skipped:
(1, 1310, 'Last partitions are not empty', ...),
(0, 1320, 'Top queries should be investigated and tuned', ...),
If any tips were skipped because of user-specified exclusions, they will be listed in the second result set that describes skipped tips.
Tips are produced by analyzing database engine metadata in system views. For many tips, heuristics (based on configurable thresholds) are used, thus the output is produced with a finite degree of confidence as indicated in the confidence_percent
column. For each tip, our goal was to make it useful for a significant portion of our customers, not necessarily for every customer, which would be impossible given the broad variety of designs and workloads across millions of databases in Azure SQL Database.
The script is lightweight and does not query any user data. To avoid impacting workloads, the script checks current CPU utilization, and aborts before executing any other queries if CPU utilization is very high. The script also aborts any query it is running if the query is blocked for more than a few seconds waiting on a lock, to avoid contributing to blocking chains. If any tips were skipped because lock timeout was exceeded, they will be listed in the second result set that describes skipped tips.
The database where you run the script must use compatibility level 110-150 or later, which is the case for the vast majority of databases in Azure SQL. For databases using compatibility level 100, we provide a different script, get-sqldb-tips-compat-level-100-only.sql
. When using that script, data in the details
column may not be sorted in any particular order.
To determine the compatibility level of your database, execute the following query in the context of the database (not in the master
database):
SELECT name, compatibility_level
FROM sys.databases
WHERE database_id = DB_ID();
We welcome and encourage contributions to this project. You can open issues to report bugs and suggest improvements or new tips, or send pull requests to contribute code. We request that you keep the following guidelines in mind:
- A new tip or a change to an existing tip must be implementable in T-SQL using the current surface area of Azure SQL Database;
- A new tip or a change to an existing tip must be useful to a significant portion of Azure SQL Database customers;
- The implementation must be lightweight, i.e. it should not cause high resource utilization or contention that may impact running workloads, and should not cause the script to run for an excessively long time.
The last two guidelines are clearly subjective; we will use community input and our best judgement when considering contributions.
This section includes detailed descriptions for tips that may be produced by the script. Descriptions below are not in any particular order, and should be viewed as collateral to script output, not as a set of general recommendations.
Maximum degree of parallelism (MAXDOP) for primary and secondary replicas is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for both primary and secondary replicas
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 8;
ALTER DATABASE SCOPED CONFIGURATION FOR SECONDARY SET MAXDOP = PRIMARY;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Maximum degree of parallelism (MAXDOP) for the primary replica is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for the primary replica
ALTER DATABASE SCOPED CONFIGURATION SET MAXDOP = 8;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Maximum degree of parallelism (MAXDOP) for secondary replicas is not in the recommended range of 1 to 8. Depending on workload, this may cause unnecessary resource utilization, and in extreme cases, errors and timeouts.
-- Set MAXDOP to 8 for the secondary replicas
ALTER DATABASE SCOPED CONFIGURATION FOR SECONDARY SET MAXDOP = 8;
Changing default MAXDOP in Azure SQL Database
ALTER DATABASE SCOPED CONFIGURATION
Database compatibility level is not current. Recently added database engine features and improvements may not be available when using older compatibility levels. Consider using latest supported compatibility level, but note that changing compatibility level may require functional and performance testing before and after the change, known as A/B testing.
ALTER DATABASE CURRENT SET COMPATIBILITY_LEVEL = 150; -- check the first link for the latest compatibility level in Azure SQL Database (150 as of this writing)
Auto-create statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-create statistics.
ALTER DATABASE CURRENT SET AUTO_CREATE_STATISTICS ON;
Auto-update statistics is disabled. This may cause poor query performance due to suboptimal query plans. Enable auto-update statistics.
ALTER DATABASE CURRENT SET AUTO_UPDATE_STATISTICS ON;
Read Committed Snapshot Isolation (RCSI) is disabled. This may cause unnecessary lock blocking, such as read queries blocking write queries. RCSI is the preferred alternative to the NOLOCK
hint, which should not be used in scenarios that require correct query results.
The sample query below enables RCSI for a database. Note that this command requires exclusive database access, and will terminate all connections to the database after waiting for 5 seconds to give open transactions an opportunity to complete.
-- Execute in the master database, and change to reference the actual database name
ALTER DATABASE [database-name] SET READ_COMMITTED_SNAPSHOT ON WITH ROLLBACK AFTER 5 SECONDS;
Snapshot isolation in SQL Server
SET TRANSACTION ISOLATION LEVEL
The Dirty Secrets of NOLOCK, a free course by Kendra Little
Query Store is disabled. This complicates performance tuning and troubleshooting, and could disable certain Intelligent Query Processing features that can improve query performance and reduce resource consumption. Enable Query Store.
ALTER DATABASE CURRENT SET QUERY_STORE = ON;
This tip may only appear on the primary replica.
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Query Store operation mode is READ_ONLY. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that can improve query performance and reduce resource consumption.
The details
column provides the reason for the read-only state. Change Query Store operation mode to READ_WRITE.
ALTER DATABASE CURRENT SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
This tip may only appear on the primary replica.
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Query Store capture mode is NONE. This complicates performance tuning and troubleshooting, and impacts certain Intelligent Query Processing features that can improve query performance and reduce resource consumption.
Change Query Store query capture mode to AUTO or ALL depending on requirements, or use a custom query capture policy that captures all queries relevant to your workload.
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = AUTO);
ALTER DATABASE CURRENT SET QUERY_STORE (QUERY_CAPTURE_MODE = ALL);
This tip may only appear on the primary replica.
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
AUTO_SHRINK is enabled. An occasional shrinking of a database may be needed in response to a one-time data deletion event and/or to reduce excessive allocated but unused space in data files. However, shrinking should not be executed continuously by keeping auto-shrink enabled at all times, because it causes persistent and high resource utilization that will negatively impact workload performance.
ALTER DATABASE CURRENT SET AUTO_SHRINK OFF;
Considerations for AUTO_SHRINK
Recommendations and guidelines for setting the AUTO_SHRINK database option in SQL Server
Btree indexes have uniqueidentifier (GUID) leading columns. For larger tables, the details
column contains a list of btree indexes with uniqueidentifier
leading columns. Such indexes are subject to low page density and high fragmentation as data is modified. This leads to increased memory and disk space usage and negatively impacts performance.
Avoid this pattern in physical database design, particularly for clustered btree indexes. Rebuild indexes periodically when page density is less than 70%, either manually or using a database maintenance solution.
-- Determine average page density for an index.
-- Replace [object_id] and [index_id] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object_id], [index_id], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
-- set FILLFACTOR to 80 if significant amounts of new data are being added, or if the values in the uniqueidentifier index leading column change frequently
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8, FILLFACTOR = 100);
SQL Server Index Architecture and Design Guide
Database maintenance solution by Ola Hallengren
FORCE_LAST_GOOD_PLAN auto-tuning option is not enabled. Query plan regressions will not be fixed automatically via plan forcing. To avoid performance issues due to unexpected query plan regressions, enable this auto-tuning option.
ALTER DATABASE CURRENT SET AUTOMATIC_TUNING (FORCE_LAST_GOOD_PLAN = ON);
This tip may only appear on the primary replica.
Automatic tuning in Azure SQL Database
Enable automatic tuning in the Azure portal
Used data size within the database is close to maximum configured database size (MAXSIZE). To allow continued data growth, increase maximum database size proactively, or scale up to a service tier or service objective that supports higher maximum database size, or implement data compression, or delete unneeded data.
If used data size is approaching 4 TB, plan on reducing used data size by compressing or deleting data, or switch to the Hyperscale service tier.
-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';
-- Increase MAXSIZE to 4 TB
ALTER DATABASE [database_name] MODIFY (MAXSIZE = 4096 GB); -- adjust database name and MAXSIZE value according to requirements
Manage file space for databases in Azure SQL Database
What happens when database storage limit is reached
Storage size allocated for data files is close to maximum configured database size (MAXSIZE). If used size is not close to MAXSIZE and significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.
Note that data file shrink is a long-running and resource intensive process that should not be executed on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.
Prior to shrinking, if page density is low (below 70%), rebuild indexes.
-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';
-- Determine average page density for each index
-- Replace [object_id] and [index_id] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object_id], [index_id], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);
-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed
-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed
Manage file space for databases in Azure SQL Database
Storage size allocated for data files is much larger than used data size. If significant data growth is not expected, consider shrinking data files to reduce allocated size. This is particularly relevant for databases in elastic pools when the total allocated storage size for the pool is close to pool storage limit.
Note that data file shrink is a long-running and resource intensive process that should not be used on a regular basis. Shrink will also increase index fragmentation, which may impact workloads using large readaheads in index scans.
Prior to shrinking, if page density is low (below 70%), rebuild indexes.
-- Determine used space, allocated space, and database MAXSIZE
SELECT SUM(CAST(FILEPROPERTY(name, 'SpaceUsed') AS bigint) * 8 / 1024.) AS space_used_mb,
SUM(CAST(size AS bigint) * 8 / 1024.) AS space_allocated_mb,
CAST(DATABASEPROPERTYEX(DB_NAME(), 'MaxSizeInBytes') AS bigint) / 1024. / 1024 AS max_size_mb
FROM sys.database_files
WHERE type_desc = 'ROWS';
-- Determine average page density for each index
-- Replace [object_id] and [index_id] placeholders with actual values from the details column, or with NULL/default to see all objects and indexes
SELECT OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_page_space_used_in_percent,
ips.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), [object_id], [index_id], default, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND
ips.index_id = i.index_id
ORDER BY page_count DESC;
-- Consider rebuilding an index if its average page density is below 70%
ALTER INDEX [index_name] ON [schema_name].[table_name] REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAXDOP = 8);
-- First, attempt to truncate to release any space at the end of the file
DBCC SHRINKFILE (1, TRUNCATEONLY); -- change file_id if needed
-- If allocated space is still too large, shrink the file. This takes longer.
DBCC SHRINKFILE (1, 1024); -- change file_id if needed, change target size from 1024 (1 GB) if needed
Manage file space for databases in Azure SQL Database
Significant CPU throttling has recently occurred, as noted in the details
column. If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries ranked high by cpu time.
CPU throttling is common on lower and resource-constrained service objectives, e.g. Basic, S0, S1, etc.
What happens when CPU resource limits are reached
Out of memory errors have recently occurred, as noted in the details
column. Tune query workload to consume less memory, or scale up to a service objective with more memory, or both. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.
What happens when memory resource limits are reached
Memory grant waits and/or timeouts have recently occurred. The data in the details
column notes the number of recent requests for memory grants that had to wait (commonly manifested as RESOURCE_SEMAPHORE
wait types), and the number of such requests that timed out waiting. To find queries that have consumed the most memory in a time interval, review top queries ranked high by maximum query memory used.
Tune query workload to use smaller memory grants, or reduce the number of queries running concurrently, or scale up to a service objective with more memory.
What happens when memory resource limits are reached
Nonclustered indexes with low reads found. The details
column contains a list of non-unique nonclustered indexes where the number of index read operations is much less than the number of index write (update) operations.
As data changes, indexes must be updated, which requires CPU and IO resources and slows down DML statements. The resource overhead of updating indexes that have few reads may outweigh their benefit, if any. This overhead is particularly noticeable when there are many indexes with low reads in the database, a phenomenon known as "over-indexed" databases.
If the data in details
is for a sufficiently long period during which all critical workloads have run, drop these indexes. Then, monitor performance to validate that no significant negative impact has occurred. If performance regression is unacceptable, recreate dropped indexes.
When running query workloads on readable secondary replicas, use the sample query below to collect index read and write statistics from all replicas and add them up for each index. An index that has low reads on the primary replica may have high reads on readable replicas, and may be required.
-- Index read (seeks, scans, lookups) and write (updates) statistics for each nonclustered index not used for constraint enforcement
SELECT QUOTENAME(OBJECT_SCHEMA_NAME(o.object_id)) AS schema_name,
QUOTENAME(o.name) AS object_name,
QUOTENAME(i.name) AS index_name,
ius.user_seeks,
ius.user_scans,
ius.user_lookups,
ius.user_updates
FROM sys.dm_db_index_usage_stats AS ius
INNER JOIN sys.indexes AS i
ON ius.object_id = i.object_id
AND
ius.index_id = i.index_id
INNER JOIN sys.objects AS o
ON i.object_id = o.object_id
AND
ius.object_id = o.object_id
WHERE ius.database_id = DB_ID()
AND
i.type_desc = 'NONCLUSTERED'
AND
i.is_primary_key = 0
AND
i.is_unique_constraint = 0
AND
i.is_unique = 0
AND
o.is_ms_shipped = 0;
SQL Server Index Architecture and Design Guide
ROW or PAGE data compression opportunities found. The details
column contains a list of partition ranges for each object and index where data compression may be beneficial, based on recent workload sampling and heuristics. The current and suggested new data compression types are shown. The list is sorted with the largest objects first. To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.
In the following example of data in the details
column, the suggested data compression is ROW
for the first 5 partitions, and PAGE
for the last 5 partitions of the clustered index in the usertable
table. The total size of all partitions in each range, and a breakdown of that size by storage type are included. The row overflow and LOB storage is not compressible.
schema: [dbo], object: [usertable], index: [pk_usertable], index type: CLUSTERED, object size (MB): 11,535.46, partition range: 1-5, partition range total size (MB): 5,319.10 (in-row: 5,319.10, row overflow: 0.00, LOB: 0.00), present compression type: ROW, suggested compression type: ROW
schema: [dbo], object: [usertable], index: [pk_usertable], index type: CLUSTERED, object size (MB): 11,535.46, partition range: 6-10, partition range total size (MB): 6,216.36 (in-row: 6,216.36, row overflow: 0.00, LOB: 0.00), present compression type: ROW, suggested compression type: PAGE
ROW compression adds minimal CPU utilization while providing substantial storage and memory savings. For most workloads, ROW compression should be enabled by default for new tables and indexes. PAGE compression adds higher CPU utilization, but also provides higher storage and memory savings.
-- Create an index with ROW compression
CREATE INDEX ix_table1_1 ON dbo.table1 (column1, column2) WITH (ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=ROW);
-- Rebuild an index with PAGE compression
ALTER INDEX ix_table1_2 ON dbo.table1 REBUILD WITH (ONLINE=ON, RESUMABLE=ON, DATA_COMPRESSION=PAGE);
Additional analysis and testing based on information in provided links may be required to select data compression types optimally. Any data compression requires additional CPU processing, thus testing is particularly important when low query latency must be maintained in transactional workloads.
This tip may only appear on the primary replica.
Data Compression: Strategy, Capacity Planning and Best Practices
There are recent occurrences of transaction log IO (aka log rate) approaching the limit of the service objective, as noted in the details
column. To improve performance of bulk data modifications including data loading, consider tuning the workload to reduce log IO, or scale up to a service objective with a higher log IO limit.
Additional solutions and workarounds include:
- Using data compression (ROW, PAGE, columnstore) to reduce effective log IO;
- Loading transient data to be further transformed/aggregated into the
tempdb
database (e.g. using global temporary tables), or into non-durable memory-optimized tables. If using global temporary tables (e.g.##tablename
), set theGLOBAL_TEMPORARY_TABLE_AUTO_DROP
database-scoped configuration toOFF
, to retain global temporary tables until database engine restarts.
This tip may only appear on the primary replica.
Transaction log rate governance
A significant part of plan cache is occupied by single-use plans, which have never been reused by query executions and therefore use memory unnecessarily.
To avoid caching single-use plans for a database and reduce memory consumption, enable the OPTIMIZE_FOR_AD_HOC_WORKLOADS
database-scoped configuration for each database where single-use plans occupy a large amount of memory, as noted in the details
column. Review application code to ensure that all parameter data type declarations match table column data types exactly, including length, precision, and scale of the column, and are not derived from each value assigned to the parameter.
ALTER DATABASE SCOPED CONFIGURATION SET OPTIMIZE_FOR_AD_HOC_WORKLOADS = ON;
ALTER DATABASE SCOPED CONFIGURATION
Queries could benefit from indexes that do not currently exist. The details
column includes data to help create new indexes, based on metadata collected since database engine startup.
For more accurate indexing recommendations based on longer term historical data and including CREATE INDEX and DROP INDEX statements, see performance recommendations for this database in Azure portal, or use the Get-AzSqlDatabaseRecommendedAction PowerShell cmdlet.
SQL Server Index Architecture and Design Guide
Redo queue on a secondary readable replica is large. Queries running on this replica may experience significant data latency relative to the primary replica. A large redo queue may also cause longer failovers if the replica is used as the failover target.
Common causes of a large redo queue are CPU and/or IO resource contention on the secondary replica, or blocking of the redo thread due to long-running queries on the secondary replica and concurrent DDL operations on the primary replica.
Tune/reduce workload on the secondary replica to use less resources, and/or increase the service objective. Avoid DDL operations such as schema changes, index rebuilds, or statistics updates on the primary replica while there are long-running queries executing on the secondary replica.
This tip is produced only on readable secondary replicas, including non-Hyperscale read scale-out replicas and geo-replicas.
Monitoring and troubleshooting read-only replicas
Data IOPS are close to workload group limit. If workload performance for this database is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.
Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
Compare the DTU-based service tiers
Workload group IO governance impact is significant. If workload performance for this database is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale to a service objective with a higher data IOPS limit.
Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
Compare the DTU-based service tiers
Data IOPS are close to resource pool limit. If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.
Resource pool defines resource governance limits for an elastic pool.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
Compare the DTU-based service tiers
Resource pool IO governance impact is significant. If workload performance for one or more databases in this elastic pool is insufficient and PAGEIOLATCH*
waits are a top wait type, either tune the workload to generate less data IO, or scale the elastic pool to a service objective with a higher data IOPS limit, or rebalance databases across multiple elastic pools to reduce cumulative data IO in this elastic pool.
Resource pool defines resource governance limits for an elastic pool.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
Compare the DTU-based service tiers
Persistent Version Store (PVS) size is large. PVS size, used and allocated data size, maximum database size, and other relevant information is available in the details
column.
PVS is a part of the user database, and is used to store row versions when Accelerated Database Recovery (ADR), Read-Committed Snapshot Isolation (RCSI), or Snapshot Isolation (SI) are enabled for the database. Large PVS size is usually caused by long-running active transactions. It may also occur due to a large number of aborted transactions that haven't yet been cleaned up by the PVS cleaner background thread. In extreme cases, a large PVS may cause the database to run out of space until PVS space is cleaned up.
Review application patterns to avoid long-running transactions, and make sure to explicitly roll back open transactions after query timeouts and other errors.
If there are any transactions that have been unintentionally left open for a long time and caused PVS to become large, consider killing them to allow PVS cleanup to occur.
-- Transactions sorted by duration
SELECT st.session_id,
dt.database_transaction_begin_time,
DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
dt.database_transaction_log_bytes_used,
dt.database_transaction_log_bytes_reserved,
st.is_user_transaction,
st.open_transaction_count,
ib.event_type,
ib.parameters,
ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = DB_ID()
ORDER BY transaction_duration_seconds DESC;
-- Kill the session with a transaction open for a long time
KILL 4242;
Manage file space for databases in Azure SQL Database
Paused resumable index maintenance operations found. Paused resumable index operations will be aborted automatically after a period of time specified by the PAUSED_RESUMABLE_INDEX_ABORT_DURATION_MINUTES
database-scoped configuration.
Either resume these index operations, or, if not planning to resume, abort them explicitly to avoid unnecessary performance impact on the DML operations changing these indexes between now and the time when auto-abort occurs.
ALTER INDEX [index_name] ON [schema_name].[object_name] RESUME; -- change schema, object, and index names
ALTER INDEX [index_name] ON [schema_name].[object_name] ABORT; -- change schema, object, and index names
This tip may only appear on the primary replica.
sys.index_resumable_operations
A clustered columnstore index (CCI) can be created on one or more tables to improve query and data load performance and drastically reduce storage space and memory consumption by the table.
CCI indexes are most useful in analytical/reporting workloads where queries use large scans. CCI indexes are not recommended in OLTP workloads, or when queries predominantly use index seeks, lookups, and small range scans. They are also not recommended when rows in the table are frequently updated, or when less frequent but large updates occur.
The details
column provides a list of CCI candidate tables. Index usage statistics for the recent query workload on each table are included to help determine if CCI will benefit queries referencing this table.
To improve tip accuracy, obtain this result while a representative workload is running, or shortly thereafter.
This tip may only appear on the primary replica.
Columnstore indexes - Design guidance
Columnstore indexes - Query performance
Columnstore indexes - Data loading guidance
Some geo-replication links may be unhealthy. The details
column describes geo-replication link health, including data propagation lag (reported on the primary geo-replica only) and replica state.
If a geo-replication link becomes unhealthy during high write load on the primary geo-replica, then the service objective of the secondary geo-replica may need to be increased to sustain data propagation rate and avoid geo-replica unavailability.
For a recently created geo-replica, the SEEDING
replication state is expected. In all other cases, replication state for a healthy geo-replica should be CATCH_UP
.
sys.dm_geo_replication_link_status
Last partitions are not empty for some partitioned tables. If the very last partition is not empty, and is split to create a new partition for new data, the split operation will be slow and resource-intensive.
If new partitions are periodically added to accommodate new data, make sure to keep a buffer of several empty partitions at the end of the partition list. While a single empty partition is sufficient, multiple empty partitions are preferred. They reduce the risk of data getting into the very last partition because of automation failure to split the last empty partition on time.
The details
column provides the list of last several partitions for each table if at least some of them are not empty. If the very last partition (i.e. the partition number equal to the total count of partitions) is empty, act before any data is loaded into this partition to split it and create at least one empty partition at the end of the partition list. Otherwise, plan and prepare for a time-consuming process of splitting a non-empty partition, to avoid all new data being in the last partition.
ALTER PARTITION FUNCTION [function_name] SPLIT RANGE ([new boundary value]);
Partitioned Tables and Indexes
Blog articles by Kendra Little:
How to Fix Lopsided Partitioned Tables
Sliding Window Table Partitioning: What to Decide Before You Automate
To help in performance investigations, top queries for a time interval are summarized in the details
column. This tip is informational in nature, and is produced when any queries have executed in the time interval. On idle databases, top queries may include the queries executed by the get-sqldb-tips.sql
script itself. On readable secondary replicas, this tip reports on queries that executed on the primary.
For each query, identified by its query_hash
value, query_id
, plan_id
, the number of query executions by execution type (regular, aborted, exception) and weighted wait categories are included in details
.
Additionally, each query is ranked by its consumption of the following resources:
- cpu time
- duration
- execution count
- logical IO reads
- physical IO reads
- maximum query memory used
- log bytes used
- tempdb space used
- parallelism
Lower rank numbers mean higher rank, e.g. the highest CPU consuming query has cpu time rank 1.
In the following example, the query with hash 0x5F9043DBAAEBB1FF
is the top consumer of tempdb
space (tempdb used rank: 1
), has two associated query_id
values (66
and one more), and three associated plan_id
values (36
and two more). It successfully executed 31615
times in the time interval. Most waits (~77.4%) were on CPU.
query hash: 0x5F9043DBAAEBB1FF, query_id: 66 (+1), plan_id: 36 (+2), executions: (regular: 31615, aborted: 0, exception: 0), CPU time rank: 4, duration rank: 32, executions rank: 50, logical IO reads rank: 6, physical IO reads rank: 5, max used memory rank: 79, log bytes used rank: 48, tempdb used rank: 1, parallelism rank: 50, weighted wait categories: CPU (0.774) | Latch (0.137) | Memory (0.057) | Unknown (0.028) | Preemptive (0.002) | Buffer IO (0.002)
Depending on the nature of the performance issue, focus on the queries ranked high for the relevant resource type. For example, if troubleshooting high tempdb
space consumption, look at the tempdb used
rank.
To change the number of top ranked queries included for each resource type, change the @QueryStoreTopQueryCount
variable.
By default, the time interval used is the last hour. You can change the @QueryStoreIntervalMinutes
variable to look at shorter or longer intervals. To look at a specific time interval in the past, set the @QueryStoreCustomTimeStart
and @QueryStoreCustomTimeEnd
variables to the start and end time of the interval. Note that this tip may not appear if the specified interval is shorter than the configured Query Store resource statistics interval length.
For further investigations, use Query Performance Insight in Azure portal, Query Store reports in SSMS, or query sys.query_store_*
views directly.
The query below provides resource consumption details for a given query hash value, as well as associated query_id
and plan_id
values and SQL text.
-- Return query/plan runtime statistics in the specified interval for a given query_hash value from the details column for further investigation
DECLARE @QueryHash binary(8) = 0x113AAAF159577E8C; -- replace query hash with the actual value from details
DECLARE @QueryStoreIntervalMinutes int = 60;
DECLARE @QueryStoreCustomTimeStart datetimeoffset -- = '2021-01-01 00:01 +00:00';
DECLARE @QueryStoreCustomTimeEnd datetimeoffset -- = '2021-12-31 23:59 +00:00';
SELECT q.query_id,
q.context_settings_id,
qt.query_sql_text,
OBJECT_SCHEMA_NAME(q.object_id) AS schema_name,
OBJECT_NAME(q.object_id) AS object_name,
p.plan_id,
TRY_CAST(p.query_plan AS xml) AS query_plan,
SUM(IIF(rs.execution_type_desc = 'Regular', rs.count_executions, 0)) AS count_regular_executions,
SUM(IIF(rs.execution_type_desc = 'Aborted', rs.count_executions, 0)) AS count_aborted_executions,
SUM(IIF(rs.execution_type_desc = 'Exception', rs.count_executions, 0)) AS count_exception_executions,
SUM(rs.avg_cpu_time * rs.count_executions) / SUM(rs.count_executions) AS avg_cpu_time,
SUM(rs.avg_duration * rs.count_executions) / SUM(rs.count_executions) AS avg_duration,
SUM(rs.avg_logical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_logical_io_reads,
SUM(rs.avg_physical_io_reads * rs.count_executions) / SUM(rs.count_executions) AS avg_physical_io_reads,
SUM(rs.avg_query_max_used_memory * rs.count_executions) / SUM(rs.count_executions) AS avg_query_max_used_memory,
SUM(rs.avg_log_bytes_used * rs.count_executions) / SUM(rs.count_executions) AS avg_log_bytes_used,
SUM(rs.avg_tempdb_space_used * rs.count_executions) / SUM(rs.count_executions) AS avg_tempdb_space_used,
SUM(rs.avg_dop * rs.count_executions) / SUM(rs.count_executions) AS avg_dop
FROM sys.query_store_query AS q
INNER JOIN sys.query_store_query_text AS qt
ON q.query_text_id = qt.query_text_id
INNER JOIN sys.query_store_plan AS p
ON q.query_id = p.query_id
INNER JOIN sys.query_store_runtime_stats AS rs
ON p.plan_id = rs.plan_id
INNER JOIN sys.query_store_runtime_stats_interval AS rsi
ON rs.runtime_stats_interval_id = rsi.runtime_stats_interval_id
WHERE q.query_hash = @QueryHash
AND
rsi.start_time >= IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
DATEADD(minute, -@QueryStoreIntervalMinutes, SYSDATETIMEOFFSET()),
@QueryStoreCustomTimeStart
)
AND
rsi.start_time <= IIF(
(@QueryStoreCustomTimeStart IS NULL OR @QueryStoreCustomTimeEnd IS NULL) AND @QueryStoreIntervalMinutes IS NOT NULL,
SYSDATETIMEOFFSET(),
@QueryStoreCustomTimeEnd
)
GROUP BY q.query_hash,
q.query_id,
q.context_settings_id,
q.object_id,
qt.query_sql_text,
p.plan_id,
p.query_plan
ORDER BY query_id, plan_id
OPTION (RECOMPILE);
Query Performance Insight for Azure SQL Database
Monitoring performance by using the Query Store
Best practices with Query Store
Query Store blog articles by Erin Stellato
Storage space allocated for tempdb
data files is close to maximum tempdb
data size. The details
column provides tempdb
allocated, used, and maximum data sizes. If additional tempdb
space is needed, consider scaling to a service objective that provides a larger tempdb
.
In the Premium and Business Critical service tiers, tempdb
shares local SSD space with user databases. Large tempdb
allocated space can prevent user database growth and cause out-of-space errors.
As an emergency measure, you can reset tempdb
allocated space to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for elastic pools using the vCore purchasing model
tempdb database in the DTU purchasing model
Queries to monitor tempdb usage
Storage space used in tempdb
data files is close to maximum tempdb
data size. The details
column provides tempdb
allocated, used, and maximum data sizes. Continuing growth of space used may cause tempdb
out-of-space errors. If additional tempdb
space is needed, consider scaling to a service objective that provides a larger tempdb
.
As an emergency measure, you can reduce tempdb
used size to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool.
Resource limits for single databases using the vCore purchasing model
Resource limits for elastic pools using the vCore purchasing model
tempdb database in the DTU purchasing model
Queries to monitor tempdb usage
Storage space allocated for tempdb
log file is close to maximum tempdb
log size. The details
column provides tempdb
allocated, used, and maximum transaction log sizes. This may be caused by a long-running transaction modifying data in a temporary table, or using tempdb
for internal storage.
As an emergency measure, you can reset tempdb
allocated log size to its initial small value by failing over the database or elastic pool using PowerShell or REST API. Use Invoke-AzSqlDatabaseFailover or database failover API for single databases, and Invoke-AzSqlElasticPoolFailover or elastic pool failover API for elastic pools. A failover will abort any running queries, terminate connections, and cause a short period of unavailability for the database or all databases in an elastic pool. Alternatively, you can kill the sessions with long-running transactions in tempdb
.
-- Transactions in tempdb sorted by duration
SELECT st.session_id,
dt.database_transaction_begin_time,
DATEDIFF(second, dt.database_transaction_begin_time, CURRENT_TIMESTAMP) AS transaction_duration_seconds,
dt.database_transaction_log_bytes_used,
dt.database_transaction_log_bytes_reserved,
st.is_user_transaction,
st.open_transaction_count,
ib.event_type,
ib.parameters,
ib.event_info
FROM sys.dm_tran_database_transactions AS dt
INNER JOIN sys.dm_tran_session_transactions AS st
ON dt.transaction_id = st.transaction_id
OUTER APPLY sys.dm_exec_input_buffer(st.session_id, default) AS ib
WHERE dt.database_id = 2
ORDER BY transaction_duration_seconds DESC;
-- Kill the session with a transaction open for a long time
KILL 4242;
Queries to monitor tempdb usage
There are recent occurrences of worker utilization approaching the workload group limit of the service objective, as noted in the details
column. Workload group defines resource governance limits for a single database, or for a database in an elastic pool.
If the limit is reached, queries and new connection attempts will fail. Common reasons for high worker utilization include:
- excessive parallelism
- large blocking chains
- the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective
Tune query workload and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in a time interval, review top queries ranked high by parallelism and/or execution count.
Resource limits for single databases using the vCore purchasing model
Resource limits for single databases using the DTU purchasing model
What happens when worker resource limit is reached
There are recent occurrences of worker utilization approaching the resource pool limit of the service objective, as noted in the details
column. Resource pool defines resource governance limits for an elastic pool.
If the limit is reached, queries and new connection attempts will fail for some or all databases in the elastic pool. Common reasons for high worker utilization include:
- excessive parallelism
- large blocking chains
- the number of concurrently executing queries or concurrent connection attempts exceeding the capacity of the service objective
Tune query workloads and reduce/stagger connection attempts, and/or scale up to a service objective with a higher worker limit. To find queries that contribute the most to worker utilization in an elastic pool in a time interval, review top queries for all databases in the pool starting with the most utilized, ranked high by parallelism and/or execution count.
Resource limits for elastic pools using the vCore purchasing model
Resources limits for elastic pools using the DTU purchasing model
What happens when worker resource limit is reached
Notable network connectivity events have recently occurred in this database, or other databases in the same elastic pool. This includes network errors, slow login attempts, and abnormally closed connections. Retry logic implemented by an application or a MSSQL network driver can prevent these events from impacting applications.
This tip is informational in nature. The presence of network connectivity events does not necessarily indicate a problem. Some events, while reported by this tip, may be expected in Azure SQL Database. For example, idle sessions are killed after a period of inactivity by design.
The details
column provides a query to view detailed connectivity event data obtained in the latest execution of the script.
To reduce the likelihood of workload-impacting network events, configure the logical server to use the Redirect connection policy.
High CPU utilization by the database engine instance has been sustained for a significant period of time, as noted in the details
column. If the database is in an elastic pool, this refers to cumulative CPU utilization by all databases in the same pool.
If workload performance has been inadequate during that time, tune query workload to consume less CPU, or scale up to a service objective with more CPU capacity, or both. To find queries that have consumed the most CPU in a time interval, review top queries for the database, or for the most utilized databases in an elastic pool, ranked high by cpu time.
What happens when CPU resource limits are reached
Identify query performance issues
Application and database tuning guidance
Detectable database performance patterns
Some statistics may be out of date. This may cause poor query performance due to suboptimal query plans. The list of potentially out of date statistics on user objects and some system objects is included in the details
column.
Update statistics by executing sp_updatestats
, UPDATE STATISTICS
, or a database maintenance solution, and consider establishing a periodic maintenance procedure to keep statistics up to date.
EXEC sys.sp_updatestats;
Database maintenance solution by Ola Hallengren
Many tables do not have any indexes. This tip appears when a substantial percentage of all tables in the database (excluding very small tables) do not have any indexes. This often causes poor query performance due to suboptimal data retrieval. The list of such tables is included in the details
column.
You can enable the CREATE INDEX auto-tuning option for the database to have the system create missing indexes automatically to improve query performance in recent workloads that used these tables.
If you prefer to create indexes manually, review the Index Architecture and Design Guide to learn how to create efficient indexes. If the Queries could benefit from indexes that do not currently exist tip also appears, you can derive new indexes from the data in the details
column of that tip.
Enable automatic tuning in the Azure portal to monitor queries and improve workload performance