SQL Server 2012 introduced a powerful feature called FIRST_VALUE and LAST_VALUE. These functions allow us to retrieve the first and last values in a specified column within a group of rows. In this article, we will explore how these functions work and discuss an interesting puzzle related to their behavior.
Introduction to FIRST_VALUE and LAST_VALUE
Before we dive into the puzzle, let’s quickly review the basics of FIRST_VALUE and LAST_VALUE. These functions are used in conjunction with the OVER clause to perform calculations on a specific column within a partition of rows.
For example, consider the following query:
SELECT s.SalesOrderID,
s.SalesOrderDetailID,
s.OrderQty,
FIRST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FstValue,
LAST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LstValue
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID, s.SalesOrderDetailID, s.OrderQty
This query retrieves the SalesOrderID, SalesOrderDetailID, OrderQty, and the first and last values of the SalesOrderDetailID column for each SalesOrderID. The PARTITION BY clause divides the rows into groups based on the SalesOrderID, and the ORDER BY clause specifies the order in which the rows are processed.
The Puzzle
Now, let’s move on to the puzzle. In the previous query, if we change the ORDER BY clause of the OVER clause to OrderQty, the resultset changes. Specifically, the FIRST_VALUE and LAST_VALUE functions return unexpected values.
For example, consider the following query:
SELECT s.SalesOrderID,
s.SalesOrderDetailID,
s.OrderQty,
FIRST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY OrderQty ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FstValue,
LAST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY OrderQty ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LstValue
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.SalesOrderID, s.SalesOrderDetailID, s.OrderQty
In this query, rows with the same SalesOrderID have different values for the FIRST_VALUE and LAST_VALUE functions. The FIRST_VALUE returns a value greater than the LAST_VALUE, which is unexpected.
The Solution
The solution to this puzzle lies in the order of the columns selected in the SELECT and ORDER BY clauses. By changing the order of the columns, we change the order of the rows in the resultset, but the data remains the same.
Consider the following query:
SELECT s.OrderQty,
s.SalesOrderID,
s.SalesOrderDetailID,
FIRST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY OrderQty ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS FstValue,
LAST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY OrderQty ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LstValue
FROM Sales.SalesOrderDetail s
WHERE SalesOrderID IN (43670, 43669, 43667, 43663)
ORDER BY s.OrderQty, s.SalesOrderID, s.SalesOrderDetailID
In this query, the resultset is ordered by OrderQty, SalesOrderID, and SalesOrderDetailID. The FIRST_VALUE and LAST_VALUE functions now return the expected values.
Conclusion
Understanding the behavior of SQL Server functions like FIRST_VALUE and LAST_VALUE is crucial for writing efficient and accurate queries. By paying attention to the order of columns in the SELECT and ORDER BY clauses, we can ensure that these functions return the desired results.
Leave a comment below with your detailed answer to the puzzle by November 15th for a chance to win a print copy of my new book, “SQL Server Interview Questions”. If you already have this book, you can opt for any of my other books, “SQL Wait Stats” and “SQL Programming”. Good luck!