One of the key factors that affects query performance in SQL Server is the accuracy of the estimated number of rows returned by an operator in the execution plan. This estimation is based on statistics, which provide information about the data distribution in the table. However, there are cases where the statistics are outdated or even when they are up-to-date, SQL Server still does not have an accurate estimation.
To improve query performance when the estimated number of rows is wrong and updating statistics does not help, there are several steps you can take.
1. Update Statistics
Up-to-date statistics are crucial for query performance. Operations such as insert, update, delete, or merge can change the data distribution in the table, making the statistics outdated. SQL Server provides settings at the database level to update statistics automatically, but as your database grows and data changes, there may be a need to update statistics manually. This can be done with a weekly job or even more frequently if needed.
Statistics in SQL Server are system objects that contain information about the data distribution in the index key values and sometimes in regular column values. Each index has its own statistics, which are used by SQL Server during query optimization to estimate the number of rows that will be returned for an operation in the query plan.
It’s important to note that statistics are limited in size and only use one 8KB page. Each statistic has three parts, and the third part is the histogram, which represents the distribution of the data. Statistics can have up to 201 steps in the histogram, with each step storing information about larger key intervals. However, this can become a problem when querying for shorter keys that are not big enough to be one of the 201 steps in the statistics histogram.
2. Rewrite the Query
If updating the statistics does not fix the problem, you can consider rewriting the query to improve cardinality estimation. In some cases, rewriting the query can lead to a more accurate estimation of the number of rows returned by each operator in the execution plan.
For example, let’s say you have a query that joins a temporary table with the Users table based on the DisplayName column and sorts the results by the Reputation column. If the estimated number of rows is significantly different from the actual number of rows, you can try rewriting the query to join the temporary table with the Users table based on the Id column instead. This can lead to a more accurate estimation and improve query performance.
Here’s an example of how the rewritten query might look:
SELECT u.DisplayName, u.Location, u.CreationDate,
u.Reputation, u.DownVotes, u.UpVotes,
u.AccountId, u.Views, u.WebsiteUrl, u.LastAccessDate
FROM dbo.Users u INNER JOIN #Temp t
ON u.Id = t.Id
ORDER BY Reputation DESC
By rewriting the query, you can ensure that the estimated number of rows is closer to the actual number of rows, leading to a more efficient execution plan and improved query performance.
Conclusion
Accurate estimation of the number of rows returned by each operator in the execution plan is crucial for query performance in SQL Server. While updating statistics is important, there are cases where it may not improve query performance. In such cases, rewriting the query to improve cardinality estimation can be a viable solution. By ensuring that the estimated number of rows is closer to the actual number of rows, you can optimize the execution plan and achieve better query performance.
Remember to regularly update statistics and consider rewriting queries when necessary to ensure optimal performance in your SQL Server environment.
Article Last Updated: 2023-07-17