Published on

December 19, 2021

Understanding Hash Indexes in SQL Server

SQL Server 2014 introduces hash indexes for Memory-Optimized tables, but you may not know what they are and the differences between hash indexes and standard SQL Server indexes. Also, you may be asking why there are no clustered indexes on Memory-Optimized tables. Keep reading to find the answers.

What is a hash index?

Basically, a hash index is an array of N buckets or slots, each one containing a pointer to a row. Hash indexes use a hash function F(K, N) in which given a key K and the number of buckets N, the function maps the key to the corresponding bucket of the hash index. The buckets do not store the keys or its hashed value. They only store the memory address in which the data is placed.

What is a hash function?

A hash function is any algorithm that maps data of variable length to data of a fixed length in a deterministic and close to random way. A very simple hash function would be a string that returns its length, so F(“John”) = 4 and F(“Ed”) = 2. If we define a hash index of 5 buckets using this function, then the pointers that point to “John” and “Ed” are stored at buckets 4 and 2 respectively.

However, collisions can occur when two different keys produce the same hash value. For example, if we apply the same hash function on “Bill”, which also has a length of 4, it will produce the same hash value as “John”. This is called a “collision” and is very common in hash functions.

SQL Server Hekaton Collision Management

In order to handle hash collisions, SQL Server’s Hekaton engine uses chaining with linked lists. When a row is inserted and hashes to a non-empty bucket, the engine links the new row to the existing one by setting a pointer on the existing row to the new row. This allows for efficient collision management and retrieval of data.

Pros and Cons of Hash Indexes in SQL Server Hekaton

Hash indexes have their advantages and disadvantages. Here are some examples:

  • Hash indexes are efficient for equality comparisons, such as searching for a specific key value.
  • Hash indexes do not support range searches or inequality comparisons, which can result in a table scan.
  • Hash indexes are not ordered, so an ORDER BY operation will add a SORT step to the execution plan.
  • Hash indexes only support whole keys for searching. Searching by a partial key or using LIKE comparisons will result in an index scan.

It is important to understand the differences between hash and range indexes in order to make the right choice for your specific use case.

Overall, hash indexes can be a powerful tool for achieving maximum performance in Memory-Optimized tables, but they require careful planning and consideration of their limitations.

References:

Hekaton: SQL Server’s Memory-Optimized OLTP Engine, Diaconu C., Freedman C., Ismert E, Larson P., Mittal P., Stonecipher R., Verma N., Zwilling M.

Delaney K. SQL Server In-Memory OLTP Internals Overview for CTP2

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.