Have you ever encountered a situation where you need to find the gaps in a sequence of values in SQL Server? In this article, we will discuss different approaches to achieve this.
Recently, a user named Enrico asked a question about generating a SQL script that would show the gaps in a sequence. Let’s take a look at the question and the solutions provided by experts Brian and Tejas.
Enrico’s question:
INV0096 INV0097 INV0099 INV0100 INV0102 INV0103
Enrico wanted to generate a SQL script that would show the gaps in the sequence, such as INV0098 and INV0101, or even just the numbers 98 and 101.
Brian Tkatch provided two solutions to tackle this problem. Let’s take a look at the first solution:
WITH Data(Datum) AS ( SELECT 'INV0096' UNION ALL SELECT 'INV0097' UNION ALL SELECT 'INV0099' UNION ALL SELECT 'INV0100' UNION ALL SELECT 'INV0102' UNION ALL SELECT 'INV0103' ), CTE AS ( SELECT CAST(SUBSTRING(MIN(Datum), 4, 4) AS INT) Start, CAST(SUBSTRING(MAX(Datum), 4, 4) AS INT) Finish FROM Data UNION ALL SELECT Start + 1, Finish FROM CTE WHERE Start < Finish ) SELECT Common.Formatted FROM CTE CROSS APPLY(SELECT 'INV' + RIGHT('0000' + CAST(Start AS VARCHAR(4)), 4)) Common(Formatted) WHERE NOT EXISTS ( SELECT * FROM Data WHERE Data.Datum = Common.Formatted ) OPTION (MAXRECURSION 0);
Brian’s first solution uses a recursive common table expression (CTE) to generate a sequence of numbers based on the minimum and maximum values in the data. It then checks for the existence of each number in the data and returns the missing values.
Brian also provided a second solution, which is similar to the first one but without using the CROSS APPLY:
WITH Data(Datum) AS ( SELECT 'INV0096' UNION ALL SELECT 'INV0097' UNION ALL SELECT 'INV0099' UNION ALL SELECT 'INV0100' UNION ALL SELECT 'INV0102' UNION ALL SELECT 'INV0103' ), CTE AS ( SELECT CAST(SUBSTRING(MIN(Datum), 4, 4) AS INT) Start, CAST(SUBSTRING(MAX(Datum), 4, 4) AS INT) Finish FROM Data UNION ALL SELECT Start + 1, Finish FROM CTE WHERE Start < Finish ) SELECT 'INV' + RIGHT('0000' + CAST(Start AS VARCHAR(4)), 4) FROM CTE WHERE NOT EXISTS ( SELECT * FROM Data WHERE Data.Datum = 'INV' + RIGHT('0000' + CAST(Start AS VARCHAR(4)), 4) ) OPTION (MAXRECURSION 0);
Tejas Shah also provided a solution using a table variable:
DECLARE @Test TABLE (Data VARCHAR(10)) INSERT INTO @test SELECT 'INV0096' UNION ALL SELECT 'INV0097' UNION ALL SELECT 'INV0099' UNION ALL SELECT 'INV0100' UNION ALL SELECT 'INV0102' UNION ALL SELECT 'INV0103' UNION ALL SELECT 'INV0106' UNION ALL SELECT 'INV0110' ;WITH cte1 AS( SELECT CAST(RIGHT(Data,4) AS INT) AS RowID FROM @Test ), Missing AS( SELECT MIN(RowID) AS MissNum, MAX(RowID) AS MaxID FROM Cte1 UNION ALL SELECT MissNum + 1, MaxID FROM Missing WHERE MissNum < MaxID ) SELECT missnum FROM Missing LEFT JOIN cte1 tt ON tt.Rowid = Missing.MissNum WHERE tt.Rowid IS NULL OPTION (MAXRECURSION 0);
Tejas’ solution uses a recursive CTE with a table variable to find the missing values in the sequence.
These solutions provide different approaches to finding gaps in a sequence in SQL Server. Depending on your specific requirements and the size of your data, you can choose the solution that best suits your needs.
Do you know any other tricks to achieve the same result? Share your thoughts in the comments below!