In the world of SQL Server, indexes play a crucial role in optimizing query performance. They help the database engine quickly locate and retrieve the required data, resulting in faster query execution times. However, understanding how indexes work and how the query optimizer interacts with them is essential for efficient database design and query writing.
In a previous article, we discussed the importance of defining indexes based on the columns used in equality expressions in the WHERE clause. In this article, we will explore a practical example to see how the optimizer interacts with the index wizard.
Let’s consider the following query:
SET STATISTICS IO, TIME ON GO SELECT [Status], SalesPersonID, ModifiedDate, TotalDue FROM Sales.SalesOrderHeaderBig WHERE [Status] = 4 AND SalesPersonID = 289 AND ModifiedDate >= '1/1/2014' ORDER BY [Status], SalesPersonID, ModifiedDate GO SET STATISTICS IO, TIME OFF GO
When executing this query, we get a result set of 58,967 rows with 29,295 logical reads and a CPU time of 588 ms. The execution plan shows a clustered index scan with a query cost of 24.4894.
The query plan also suggests a missing covering index:
CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_StatusSalesPersonIDModifiedDate ON Sales.SalesOrderHeaderBig ([Status] ASC, SalesPersonID ASC, ModifiedDate ASC) INCLUDE (TotalDue)
Interestingly, the suggested index columns, Status and SalesPersonID, are in the same order as the WHERE clause. The first two columns use an ‘=’ connector, and the third column uses a ‘>=’ connector.
Now, let’s create three different indexes, each with the same columns as the suggested index but in a different order:
CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_StatusSalesPersonIDModifiedDate ON Sales.SalesOrderHeaderBig ([Status] ASC, SalesPersonID ASC, ModifiedDate ASC) INCLUDE (TotalDue) WITH (Data_Compression = ROW) GO CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_SalesPersonIDStatusModifiedDate ON Sales.SalesOrderHeaderBig (SalesPersonID ASC, [Status] ASC, ModifiedDate ASC) INCLUDE (TotalDue) WITH (Data_Compression = ROW) GO CREATE NONCLUSTERED INDEX IX_SalesOrderHeader_ModifiedDateStatusSalesPersonID ON Sales.SalesOrderHeaderBig (ModifiedDate ASC, [Status] ASC, SalesPersonID ASC) INCLUDE (TotalDue) WITH (Data_Compression = ROW) GO
Now, if we rerun the original query, we still get the same result set of 58,967 rows. However, the execution plan uses the second index, not the suggested one. This raises the question of why the query optimizer suggested one index but used a different one in the actual execution.
By examining the data and how each column is populated, we can gain insights into the query plan selection. The suggested index and the second index both allow the optimizer to directly access the specified rows based on the equality operators for Status and SalesPersonID. The only difference lies in how the data is stored within the index pages.
The suggested index and the second index have similar performance, with slightly higher CPU time, logical reads, and query cost for the suggested index. On the other hand, the third index, which has ModifiedDate as the first column, requires skipping multiple rows to find the next applicable Status and SalesPersonID combination. This results in considerably more page reads and a higher query cost.
Based on this example, we can draw the following conclusions:
- When writing queries and defining indexes, consider the criteria used in the WHERE clause.
- Index column selection and order should prioritize equality criteria first and then take advantage of ordering in greater than or less than WHERE clauses.
Understanding how the query optimizer interacts with indexes is crucial for optimizing query performance in SQL Server. By carefully designing indexes based on the query requirements, you can significantly improve the efficiency of your database operations.
For more information and the complete script used in this article, please refer to the attached file.