Published on

October 9, 2018

Understanding Parallelism and Clustered Index in SQL Server

Every SQL Server consulting engagement brings new insights and discussions. Recently, during a Comprehensive Database Performance Health Check, I had an interesting conversation about parallelism for heap scan. Let me share the conversation and clarify some misconceptions.

Mark: Pinal, do we need a clustered index on this table?

Pinal: Of course, we already have a primary key on the table and based on the workload analysis, I strongly believe we should create a clustered index on it.

Mark: John, didn’t we discuss yesterday that we need to have a clustered index on the table? Currently, there is no clustered index on the table, making it a heap. When we retrieve data from a heap, SQL Server does not utilize parallelism.

John: Yeah, I get that now. I never knew it before. I will definitely create a clustered index on the table so we can take advantage of parallelism.

While the conversation seemed to be heading in the right direction, I realized that I needed to clarify some doubts.

Pinal: Hold on, I need to provide some clarification here. Let’s create a clustered index on the table based on the workload analysis. However, it’s important to note that the presence of a clustered index does not directly impact parallelism and heap scan in SQL Server.

To demonstrate this, let’s create a table and populate it with data:

CREATE TABLE TempTable (
  ID INT,
  Col1 CHAR(100),
  Col2 CHAR(100)
)

-- Insert into table
INSERT INTO TempTable (ID, Col1, Col2)
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY a.name) RowID,
'Bob','Brown'
FROM sys.all_objects a
CROSS JOIN sys.all_objects b

Now, let’s enable the execution plan and run the following query:

-- SELECT Tests
SELECT *
FROM TempTable
WHERE ID = 1000

If we examine the execution plan, we will notice that the query is using parallel threads to read the data from the table, even though it is a heap. This demonstrates that SQL Server can use parallel operations to read data from a heap as well.

It’s important to understand that the decision to create a clustered index should be based on the specific workload analysis and not solely on the desire to utilize parallelism. Parallelism can be leveraged regardless of whether the table is a heap or has a clustered index.

Remember, when working with SQL Server, it’s crucial to analyze the workload and make informed decisions to optimize performance.

Feel free to drop the temporary test table using the following script:

DROP TABLE TempTable

By understanding the relationship between parallelism, clustered index, and heap scan, you can make better decisions when optimizing your SQL Server environment.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.