When it comes to optimizing SQL Server queries, one common piece of advice is to avoid using functions in the WHERE clause. While this statement may seem vague, it holds true and can significantly impact query performance. In this article, we will explore what this means and how to effectively use functions in the WHERE clause.
For our demonstration, we will be using the AdventureWorks sample database. To better understand the impact of functions on performance, we will also enable the STATISTICS IO feature, which provides valuable statistics about query execution.
USE AdventureWorks2012
GO
SET STATISTICS IO ON
GO
Let’s start by executing a query that utilizes a function in the WHERE clause:
-- SCAN - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate]
FROM [Sales].[SalesOrderDetail]
WHERE DATEDIFF(YEAR, ModifiedDate, GETDATE()) < 0
GO
Upon examining the execution plan and statistics, we can observe that the query is scanning the entire table, even though it does not return any results. It reads 1246 pages from the database.
Table 'SalesOrderDetail'. Scan count 1, logical reads 1246
To improve performance, we can create an index on the ModifiedDate column, as this is the column used in the function:
-- Create Index on ModifiedDate
CREATE NONCLUSTERED INDEX [IX_SalesOrderDetail_ModifiedDate] ON [Sales].[SalesOrderDetail] ([ModifiedDate])
GO
After creating the index, let’s rerun the same query:
-- SCAN - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate]
FROM [Sales].[SalesOrderDetail]
WHERE DATEDIFF(YEAR, ModifiedDate, GETDATE()) < 0
GO
Despite the index, the query still scans the entire table and reads over 338 pages from the database. This is because the function DATEDIFF is evaluated at runtime, requiring the SQL Server Query engine to scan the entire table to retrieve the necessary data.
To avoid this performance issue, we can rewrite the query to eliminate the use of the function:
-- SEEK - Select values from SalesOrderDetail
SELECT [SalesOrderID], [SalesOrderDetailID], [ModifiedDate]
FROM [Sales].[SalesOrderDetail]
WHERE ModifiedDate > GETDATE()
GO
By rewriting the query, we are now retrieving records with a future date, effectively achieving the same result. Upon executing the query, we can see that the logical reads have reduced to just 3, and the execution plan displays a Seek operation.
Table 'SalesOrderDetail'. Scan count 1, logical reads 3
It is important to note that in our case, both queries return the same result. However, the simple act of rewriting the query without using a function in the WHERE clause has made a significant difference in performance.
Finally, to clean up, you can run the following script to drop the newly created index:
-- Cleanup
DROP INDEX [IX_SalesOrderDetail_ModifiedDate] ON [Sales].[SalesOrderDetail]
GO
In conclusion, when working with SQL Server, it is crucial to be mindful of the performance implications of using functions in the WHERE clause. By avoiding functions and finding alternative ways to achieve the desired result, you can greatly improve query performance and optimize your database operations.