When it comes to optimizing performance in SQL Server, there are various factors to consider. One such factor is the hash bucket value used in InMemory OLTP tables. In this article, we will explore the impact of hash bucket values on performance and discuss best practices for setting them.
Recently, I had a session with a customer who was experiencing poor performance even though they were using InMemory OLTP in their SQL Server 2014 system. Upon further investigation, I discovered that they had set a very low hash bucket value in their InMemory table definition. This intrigued me and prompted me to delve deeper into the topic.
In SQL Server, hash bucket values are used to distribute data evenly across memory-optimized tables. It is recommended to set the hash bucket value to be double the number of unique values to avoid hash collisions, where two values end up in the same hash bucket. Hash collisions can negatively impact performance and should be avoided.
To demonstrate the impact of hash collisions, I created a script that compares the performance of inserting 1 million rows into two different tables with different hash bucket values.
First, I created a database called “InMem_HashCollision” and defined an InMemory table called “InMem_Data1” with a large hash bucket value of 1,048,576. I then inserted 1 million rows into this table and tracked the time taken for the operation.
-- create a simple XTP table with a bucket size 1048576
CREATE TABLE dbo.InMem_Data1
(
Col1 INT NOT NULL
PRIMARY KEY NONCLUSTERED
HASH WITH (BUCKET_COUNT = 1048576),
Col2 INT NOT NULL,
Col3 INT NOT NULL
)
WITH
(
MEMORY_OPTIMIZED = ON,
DURABILITY = SCHEMA_AND_DATA
);
-- create a native compiled stored procedure to insert data
CREATE PROCEDURE dbo.InsertInMem_Data1
(
@Rows [int]
)
WITH
NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN
ATOMIC WITH
( TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = 'us_english'
)
DECLARE @counter INT = 0;
WHILE @counter < @Rows
BEGIN
INSERT INTO dbo.InMem_Data1 (Col1, Col2, Col3)
VALUES (@counter, @counter, @counter);
SET @counter += 1;
END
END;
-- execute the stored procedure to insert 1000000 rows
DECLARE
@starttime DATETIME = CURRENT_TIMESTAMP,
@endtime DATETIME;
EXECUTE dbo.InsertInMem_Data1 1000000;
SET @endtime = CURRENT_TIMESTAMP;
PRINT CONCAT('The query executed in ',
DATEDIFF(millisecond, @starttime, @endtime), ' milliseconds.');
The above script completed the operation in approximately 1 second.
Next, I created another table called “InMem_Data2” with a smaller hash bucket value of 1,024. I inserted the same 1 million rows into this table and tracked the time taken.
-- create a table with lesser hash bucket count
CREATE TABLE dbo.InMem_Data2
(
Col1 INT NOT NULL
PRIMARY KEY NONCLUSTERED HASH WITH
(BUCKET_COUNT = 1024),
Col2 INT NOT NULL,
Col3 INT NOT NULL
)
WITH
(
MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA
);
-- create a native compiled stored procedure to insert data
CREATE PROCEDURE dbo.InsertInMem_Data2
(
@Rows [int]
)
WITH
NATIVE_COMPILATION, SCHEMABINDING, EXECUTE AS OWNER
AS
BEGIN
ATOMIC WITH
(
TRANSACTION ISOLATION LEVEL = SNAPSHOT,
LANGUAGE = 'us_english'
)
DECLARE @counter INT = 0;
WHILE @counter < @Rows
BEGIN
INSERT INTO dbo.InMem_Data2 (Col1, Col2, Col3)
VALUES (@counter, @counter, @counter);
SET @counter += 1;
END
END;
-- execute the stored procedure to insert 1000000 rows
DECLARE
@starttime DATETIME = CURRENT_TIMESTAMP,
@endtime DATETIME;
EXECUTE dbo.InsertInMem_Data2 1000000;
SET @endtime = CURRENT_TIMESTAMP;
PRINT CONCAT('The query executed in ',
DATEDIFF(millisecond, @starttime, @endtime), ' milliseconds.');
The second query took close to 47 seconds to complete, significantly longer than the first query. This demonstrates the impact of hash collisions on performance.
It is important to note that the impact of hash collisions is not limited to insert operations. Retrieving data from tables with hash collisions can also result in performance overhead. Therefore, it is crucial to carefully consider and optimize hash bucket values for your specific use case.
As a best practice, I highly recommend keeping an eye on these settings before implementing InMemory OLTP tables in your SQL Server environment. By setting appropriate hash bucket values, you can ensure optimal performance and avoid unnecessary performance bottlenecks.
Remember, the best way to optimize performance is to understand the underlying concepts and make informed decisions based on your specific requirements.
Happy optimizing!