Published on

April 5, 2010

Understanding SQL Server Statistics Sampling

When working with SQL Server, it is important to understand how statistics are sampled and the impact it can have on query performance. In this article, we will explore the default sampling behavior of statistics in SQL Server and how it can affect the accuracy of query results.

By default, SQL Server samples statistics when they are created or updated. The sampling rate can be specified by the user and can range from a very low value to 100%. Let’s perform a small experiment to verify if the auto update on statistics is enabled and examine the sampling behavior on a large table.

USE [AdventureWorks]
GO

-- Create Table
CREATE TABLE [dbo].[StatsTest] (
    [ID] [int] IDENTITY (1, 1) NOT NULL,
    [FirstName] [varchar] (100) NULL,
    [LastName] [varchar] (100) NULL,
    [City] [varchar] (100) NULL,
    CONSTRAINT [PK_StatsTest] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO

-- Insert 1 Million Rows
INSERT INTO [dbo].[StatsTest] (FirstName, LastName, City)
SELECT TOP 1000000 'Bob',
    CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 2 = 1 THEN 'Smith' ELSE 'Brown' END,
    CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 10 = 1 THEN 'New York'
         WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 10 = 5 THEN 'San Marino'
         WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 10 = 3 THEN 'Los Angeles'
         ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO

-- Update the statistics
UPDATE STATISTICS [dbo].[StatsTest]
GO

-- Show the statistics
DBCC SHOW_STATISTICS ("StatsTest", PK_StatsTest)
GO

-- Clean up
DROP TABLE [dbo].[StatsTest]
GO

From the result of the DBCC SHOW_STATISTICS command, we can see that the result set is sampled for a large dataset. The percentage of sampling is based on the data distribution and the type of data in the table. Before dropping the table, let’s check the size of the table, which is 35 MB.

Now, let’s run the same code with a smaller number of rows:

USE [AdventureWorks]
GO

-- Create Table
CREATE TABLE [dbo].[StatsTest] (
    [ID] [int] IDENTITY (1, 1) NOT NULL,
    [FirstName] [varchar] (100) NULL,
    [LastName] [varchar] (100) NULL,
    [City] [varchar] (100) NULL,
    CONSTRAINT [PK_StatsTest] PRIMARY KEY CLUSTERED ([ID] ASC)
) ON [PRIMARY]
GO

-- Insert 1 Hundred Thousand Rows
INSERT INTO [dbo].[StatsTest] (FirstName, LastName, City)
SELECT TOP 100000 'Bob',
    CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 2 = 1 THEN 'Smith' ELSE 'Brown' END,
    CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 10 = 1 THEN 'New York'
         WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 10 = 5 THEN 'San Marino'
         WHEN ROW_NUMBER() OVER (ORDER BY a.name) % 10 = 3 THEN 'Los Angeles'
         ELSE 'Houston' END
FROM sys.all_objects a
CROSS JOIN sys.all_objects b
GO

-- Update the statistics
UPDATE STATISTICS [dbo].[StatsTest]
GO

-- Show the statistics
DBCC SHOW_STATISTICS ("StatsTest", PK_StatsTest)
GO

-- Clean up
DROP TABLE [dbo].[StatsTest]
GO

In this case, the sample rate is 100% as the number of rows sampled is the same as the total number of rows in the table. The size of the table is less than 4 MB.

Let’s compare the result sets for reference:

Test 1: Total Rows: 1,000,000, Rows Sampled: 255,420, Size of the Table: 35.516 MB

Test 2: Total Rows: 100,000, Rows Sampled: 100,000, Size of the Table: 3.555 MB

The reason behind the sampling in Test 1 is that the data space is larger than 8 MB and uses more than 1024 data pages. If the data space is smaller than 8 MB and uses less than 1024 data pages, then the sampling does not happen (or happens 100%). Sampling helps in reducing excessive data scan, but it can also reduce the accuracy of the data.

Please note that this is just a sample test and cannot be claimed as a benchmark test. The results may vary on different machines.

There is much more to learn about SQL Server statistics and their impact on query performance. In a future post, we will cover this subject in more detail. If you feel that something is missing or have any questions, please feel free to add them in the comments section.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.