Published on

July 25, 2006

Exploring the Power of Numbers Tables in SQL Server

Introduction

“You must unlearn what you have learned” – Yoda

What exactly is a numbers table and more importantly, what can it do for you? To put it simply, a numbers table is a table with one INT column. That single column, which also serves as the primary key and clustered index, contains the natural (or “counting”) numbers starting with 1 and going up to some specified upper limit.

Make Your Own Numbers Table!

Let me start by saying that every database should have its own numbers table. They’re very useful (as we’ll see later), and extremely easy to create. We’ll start with the creation of the numbers table. This will work in SQL Server 2000 or SQL Server 2005:

CREATE TABLE dbo.Numbers (Num INT NOT NULL PRIMARY KEY CLUSTERED );

Well that was simple enough. Now comes the fun part – populating it. There are lots of methods to populate the numbers table, but we’ll start with the most obvious solution first. This method is a simple WHILE loop that will populate our numbers table one row at a time:

DECLARE @i INT ; 
SELECT @i = 1; 
WHILE @i <= 10000 
BEGIN 
    INSERT INTO dbo.Numbers(Num) VALUES (@i); 
    SELECT @i = @i + 1; 
END ;

This simple procedural solution works on both SQL 2000 and SQL 2005. There are numerous other clever ways to create a numbers table. Here’s one of my personal favorites:

-- Be sure to drop the numbers table if it exists 
DROP TABLE dbo.Numbers;
-- Now re-create it and fill it with sequential numbers starting at 1 
SELECT TOP 10000 IDENTITY ( INT ,1,1) AS Num INTO dbo.Numbers FROM master.INFORMATION_SCHEMA.COLUMNS i1 CROSS JOIN master.INFORMATION_SCHEMA.COLUMNS i2;
-- Add a primary key/clustered index to the numbers table 
ALTER TABLE dbo.Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Num);

Notice that this set-based method offers a significant speed improvement over the WHILE loop version. This simple query relies on the fact that a self- CROSS JOIN creates an exponential result set. If your master.INFORMATION_SCHEMA.COLUMNS table has over 400 rows in it (not unreasonable), the CROSS JOIN will generate over 160,000 rows. If you need more just slap another CROSS JOIN in there. We’re using the TOP keyword on the SELECT statement to limit the results to 10,000 rows for our purposes though. Feel free to populate your numbers table with as many numbers as you need up to the upper limit for an INT , which happens to be 2,147,483,647.

The Numbers Table in Action

The power of the numbers table is that it can be used to convert procedural solutions to set-based solutions. Consider the ubiquitous comma-delimited string-split function. As before we’ll start with the procedural version which basically consists of a WHILE loop. This code works on SQL 2000 and SQL 2005:

CREATE FUNCTION dbo.fnProceduralSplit(@string VARCHAR (8000)) RETURNS @ret TABLE (
    Num INT IDENTITY (1,1) NOT NULL PRIMARY KEY CLUSTERED ,
    String VARCHAR (255)) AS 
BEGIN 
    DECLARE @i INT , @j INT ; 
    SELECT @i = 1; 
    SELECT @j = CHARINDEX (',', @string); 
    WHILE (@j > 0) 
    BEGIN 
        INSERT INTO @ret (String) VALUES ( SUBSTRING (@string, @i, @j - @i)); 
        SELECT @i = @j + 1; 
        SELECT @j = CHARINDEX (',', @string, @i); 
    END ; 
    INSERT INTO @ret (String) VALUES ( SUBSTRING (@string, @i, LEN (@string) - @i + 1)); 
    RETURN ; 
END 
GO 

All this code does is loop through the string passed in as a parameter looking for commas. It breaks out the strings between any commas it finds. It’s probably about as efficient as you can get for procedural code. So now let’s see what the equivalent function looks like in a set-based version on SQL 2000 and SQL 2005. In this version we’ll let SQL Server do all the heavy lifting. We’ll use an inline table-valued function for performance, and we’ll return the number representing the position of each comma in the list to keep the relative ordering of strings in the list:

CREATE FUNCTION dbo.fnSetSplit (@String VARCHAR (8000)) RETURNS TABLE AS 
RETURN ( 
    SELECT Num, SUBSTRING (@String, CASE Num WHEN 1 THEN 1 ELSE Num + 1 END , CASE CHARINDEX (',', @String, Num + 1) WHEN 0 THEN LEN (@String) - Num + 1 ELSE CHARINDEX (',', @String, Num + 1) - Num - CASE WHEN Num > 1 THEN 1 ELSE 0 END END ) AS String 
    FROM dbo.Numbers 
    WHERE Num <= LEN(@String) AND ( SUBSTRING (@String, Num, 1) = ',' OR Num = 1)
) 

You can test the functionality of your new UDF’s with queries like the following:

SELECT * FROM dbo.fnSetSplit ('Jack,Jill,Hill,Water'); 
SELECT * FROM dbo.fnSetSplit ('Alexander The Great'); 
SELECT * FROM dbo.fnSetSplit ('Red,Green,Blue'); 
SELECT * FROM dbo.fnProceduralSplit ('Army,Navy,Air Force,Marines'); 
SELECT * FROM dbo.fnProceduralSplit ('Washington,Jefferson,Adams'); 
SELECT * FROM dbo.fnProceduralSplit ('1,2,3,4,5,6,7,8'); 

On an individual basis, for single queries like those above you might not notice a large difference in performance, but for thousands of such queries in a short time period, the difference can be significant.

Here Comes the Second Example

OK, still not convinced? Then let’s try another example. Here we’ll eliminate duplicate side-by-side letters from a string. This is a commonly performed operation in phonetic encoding and approximate search routines, and here we’ll do it in a simple set-based UDF that will work on SQL 2000 or 2005:

CREATE FUNCTION dbo.fnReduceDupes(@string VARCHAR (8000)) RETURNS VARCHAR (8000) AS 
BEGIN 
    DECLARE @Result VARCHAR (8000); 
    SELECT @Result = @string; 
    SELECT @Result = STUFF (@Result, Num, 1, CASE SUBSTRING (@Result, Num, 1) WHEN SUBSTRING (@Result, Num + 1, 1) THEN '!' ELSE SUBSTRING (@Result, Num, 1) END ) 
    FROM dbo.Numbers 
    WHERE Num <= LEN (@Result); 
    SELECT @Result = REPLACE (@Result, '!', ''); 
    RETURN @Result; 
END 

This UDF reduces sequences of side-by-side duplicate characters to a single character. It does this by first replacing all but the last of any sequence of side-by-side duplicate characters with an exclamation point (‘!’, although any character could be used.) It then uses the Transact-SQL REPLACE function to remove all exclamation points from the input string. Again, this function would be a perfect candidate for a WHILE loop, but the numbers table lets us tell SQL Server to do the hard work for us.

Here are some samples to try:

SELECT dbo.fnReduceDupes ('AAAABBBBBCCDDDDEEEEFFGGGGHIIIIJJKK'); 
SELECT dbo.fnReduceDupes('HHHHHIIIIII TTTTHHHEEEERRRRRRREEEE'); 

One Last Time

OK, so we can see the usefulness of a numbers table in creating set-based position-dependent string functions. But all the world is not a string; and often well need to apply business logic that has nothing to do with string manipulation. Let’s consider the SQL 2000 Northwind Database for a moment. This database has a very simple “inventory” system of sorts — the kind you might create for a very basic Webstore. Basically the dbo.Products table has a single column named UnitsInStock . This is your inventory. The items and quantities of those items ordered are stored in the ProductID and Quantity columns of the dbo.[Order Details] table.

The problem is this: we need to match up the items in inventory to specific customer orders. The only real issue we might face is the total quantity of a particular product on order totals more than the total quantity of that product in inventory. I.e., we’ve sold or promised more than we have on hand. In a more detailed inventory tracking system (such as the SQL 2005 AdventureWorks shelf/bin inventory system), we are looking at more potential issues (i.e., a customer orders more of a product than is contained in one bin so we have to pull from multiple bins to fill one order, assigning items from bins to customer orders, etc.)

A row-by-row cursor might seem like the obvious solution for this problem. After all, we need to assign each item we have in inventory to a customer order detail row. Also we need to keep track of the inventory quantity to make sure we don’t over-commit and try to ship more product than we have on hand. Or do we?

Here I’ll present a SQL 2000 Northwind solution to this problem:

SELECT Order_Items.OrderID, Product_Inventory.ProductID, COUNT (*) AS ItemsFromInventory 
FROM ( 
    SELECT ProductID, Num 
    FROM dbo.Numbers n 
    INNER JOIN dbo.[Products] p ON n.Num BETWEEN 1 AND p.UnitsInStock
) Product_Inventory 
INNER JOIN ( 
    SELECT OrderID, ProductID,
        ( 
            SELECT COALESCE ( SUM (o1.Quantity),0) 
            FROM dbo.[Order Details] o1 
            WHERE o1.ProductID = o.ProductID AND o1.OrderID < o.OrderID
        ) + n.Num AS Num 
    FROM dbo.Numbers n 
    INNER JOIN dbo.[Order Details] o ON n.Num BETWEEN 1 AND o.Quantity
) Order_Items ON Product_Inventory.ProductID = Order_Items.ProductID AND Product_Inventory.Num = Order_Items.Num 
GROUP BY Order_Items.OrderID, Product_Inventory.ProductID 
ORDER BY Order_Items.OrderID, Product_Inventory.ProductID;

OK, so what just happened here? To begin with, we joined each product we have in inventory to the numbers table using ON n.Num BETWEEN 1 AND p.UnitsInStock. What this does is individually number each product item ordered. By way of example, if we have two of Product ID 998 in stock and three of Product ID 999 in stock, the Product_Inventory subquery will generate the following result:

ProductID Num 
998 1 
998 2 
999 1 
999 2 
999 3 

Next we assigned a number to each item on each order. This is the Order_Items subquery. It’s more complex than the Product_Inventory query because we have to account for both ProductID and OrderID . The complexity comes because the items ordered and their quantities are spread out over several rows on several different orders. But we want the count to go from 1… n where n is the total quantity of a single ordered item across all orders. Functionally we’re just replicating SQL 2005’s ROW_NUMBER functionality, which will do this exact same thing for us when the optional PARTITION BY clause is specified.

We INNER JOIN these two subqueries by their ProductID and Num columns, which effectively assigns one item from inventory to one item on each order. We don’t have to worry about assigning more items than we have in inventory since the INNER JOIN s in the subqueries won’t let us assign more of an item than we have in inventory.

After assigning each inventory item to an order detail item, we group the results. The COUNT (*) of each group gives us the total number of each item from inventory that was assigned to a given order.

The SQL 2005 version is similar, although the ROW_NUMBER function and CTE’s will probably make the job easier. Of course, as mentioned, the AdventureWorks sample database uses a shelf/bin inventory system that’s also slightly more complex. I’ll leave the SQL 2005 version of this query as an exercise for the reader.

Conclusions

Numbers tables are an extremely useful tool for writing efficient set-based code to replace cursors and other procedural constructs.

Click to rate this post!
[Total: 0 Average: 0]

Let's work together

Send us a message or book free introductory meeting with us using button below.