When it comes to querying data in SQL Server, there are several operators that can be used to filter and retrieve the desired results. In this article, we will explore the concepts of NOT EXISTS and NOT IN operators and discuss their differences and performance implications.
First, it is important to note that NOT EXISTS and NOT IN are not equivalent in all cases, especially when NULL values are involved. Unlike EXISTS and IN, these operators will return different results when the subquery returns even one NULL value.
Let’s take a closer look at the NOT IN operator. When using NOT IN, the comparison is done by checking each individual value against the subquery results. If any row in the subquery returns NULL, the entire NOT IN operator will evaluate to either FALSE or NULL, resulting in no records being returned.
On the other hand, the EXISTS operator solely checks for the presence or absence of a row in the subquery. It cannot return NULL and can only return true or false. Therefore, when the column in the subquery that’s used for comparison with the outer table can have nulls, it is important to carefully consider whether to use NOT EXISTS or NOT IN.
Now, let’s discuss the performance implications of these operators. In a benchmark-style setup, we can compare the speed of NOT EXISTS and NOT IN when the columns involved in the comparison are defined as NULL and when they are defined as NOT NULL.
In our tests, we created two tables: BigTable and SmallerTable. We inserted data into these tables and created indexes on the join columns. We then executed two queries: one using NOT IN and the other using NOT EXISTS.
When the columns are defined as NOT NULL, both operators perform similarly. The execution plans and characteristics are identical, resulting in similar CPU and elapsed times.
However, when the columns are defined as nullable, the performance differences become apparent. The NOT IN operator takes significantly longer to execute and performs thousands of times more reads compared to NOT EXISTS. The execution plans for NOT IN are more complex, which may explain the performance difference.
Based on these findings, it is important to consider the following takeaways:
- NOT EXISTS and NOT IN do not have the same behavior when NULL values are involved. Choose carefully based on your specific requirements.
- Columns that will never contain NULL values should be defined as NOT NULL to optimize performance and avoid complex execution plans.
- On non-nullable columns, the behavior and performance of NOT IN and NOT EXISTS are similar, so choose the one that works best for your situation.
By understanding the differences and performance implications of NOT EXISTS and NOT IN operators, you can make informed decisions when writing SQL queries and optimize the performance of your database operations.