When working with SQL Server, you may come across the terms “seek” and “scan” in relation to index operations. Understanding the difference between these two operations can help you optimize your queries and improve performance.
A seek operation involves navigating down the index’s b-tree structure to locate a specific row or a range of rows based on a given predicate. This predicate must be in a form that can be used as a search argument (SARGable). On the other hand, a scan operation reads the leaf level of an index, potentially including intermediate pages as well.
Let’s take a look at some examples to better understand these concepts. We’ll use a simple Numbers table for demonstration purposes:
CREATE TABLE Numbers (
Number INT NOT NULL PRIMARY KEY CLUSTERED
);
INSERT INTO Numbers (Number)
SELECT TOP (1000000) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM msdb.sys.columns c1 CROSS JOIN msdb.sys.columns c2;
With the Numbers table created, let’s examine the number of pages in its clustered index:
SELECT OBJECT_NAME(object_id) AS TableName, index_level, page_count, record_count, avg_record_size_in_bytes
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('Numbers'),1,NULL, 'Detailed');
In this case, we have a three-level deep clustered index with 1608 pages at the leaf level and a total of 1614 pages across all levels.
Now, let’s explore some basic queries to see the difference between seeks and scans. We’ll start with a simple select statement that retrieves all rows from the Numbers table:
SET STATISTICS IO ON;
SELECT * FROM Numbers;
The result of this query shows a scan operation with a scan count of 1 and logical reads of 1615. This means that the entire index was read, including all pages at every level.
Next, let’s try a seek operation by querying for a specific number:
SELECT * FROM Numbers WHERE Number = 137;
In this case, the result shows a seek operation with a scan count of 0 and logical reads of 3. Since we have a three-level deep clustered index, it makes sense that the seek operation requires three reads to locate the desired row.
Now, what happens when we perform a query without a predicate? For example:
SELECT TOP (1) * FROM Numbers;
Since there is no predicate, a seek operation is not possible. Instead, the query is implemented as a scan operation. However, this scan does not read the entire table. The result shows a scan count of 1, logical reads of 3, and physical reads of 0. This means that the scan operation only read one page of the leaf level, likely the root or intermediate pages used to locate the first page in the leaf. The scan stopped after reading just one row because that was all that was needed. This behavior is due to the presence of a row goal in the query.
There are other cases where a scan operation won’t read the entire index leaf level. For example, when performing aggregations like finding the minimum or maximum value of an indexed column:
SELECT MIN(Number) FROM Numbers;
In this case, the result shows a scan count of 1, logical reads of 3, and physical reads of 0. The same applies to the EXISTS operator:
IF EXISTS(SELECT 1 FROM Numbers)
SELECT 1;
In conclusion, a seek operation requires a predicate, while a scan operation does not. However, it’s important to note that a scan operation doesn’t always mean that the entire table is read. Depending on the query and the presence of row goals or other factors, a scan operation may only read a subset of the index’s leaf level.
Understanding the difference between seeks and scans can help you optimize your queries by choosing the appropriate index and query structure. By minimizing unnecessary scans and leveraging seeks where possible, you can improve the performance of your SQL Server database.