Thursday, March 11, 2010

Using MAX(datetime) and Sub-Queries to Find the Most Recent Item

Today I was preparing a query that pulled some information from the inventory and sales tables of our ERP. The query itself was pretty easy to craft, but the VP of Sales threw a wrench into the mix; it should only contain information regarding the most recent sales order. Let's imagine my query was providing output like this table. Of course the actual query was more complex, but this gives the important information to understand the problem.

Part_No Rev On_Hand Safety_Stock SO_No SO_Date 
12345 10 15 S1234 12/14/2009
12345 10 15 S1233 10/01/2009 
12345 10 15 S1232 08/02/2009 
12346 S1231 08/01/2009 
12347 S1230 10/20/2009 
12347 S1229 07/15/2009 

What I actually needed the query to deliver were lines 1, 4, and 5. These are only the most recent sales orders associated with each part number.

karthik_padbanaban, a fellow member of SQL Team Forums, posted a solution to another member's question, which I was able to adapt to my problem.

part_no VARCHAR(5),
rev CHAR,
on_hand TINYINT,
safety_stock TINYINT,
so_no VARCHAR(5),
so_date DATETIME

SELECT '12345', 'A', 10, 15, 'S1234', '12/14/2009' UNION ALL
SELECT '12345', 'A', 10, 15, 'S1233', '10/01/2009' UNION ALL
SELECT '12345''A', 10, 15, 'S1232', '08/02/2009' UNION ALL
SELECT '12346', '',  5, 0, 'S1231', '08/01/2009' UNION ALL
SELECT '12347', '-', 0, 0, 'S1230', '10/20/2009' UNION ALL
SELECT '12347', '-', 0, 0, 'S1229', '07/15/2009'

WHERE so_date =
    SELECT MAX(so_date)
    FROM @A AS B
    WHERE B.part_no = A.part_no AND B.Rev = A.Rev

part_no rev on_hand safety_stock so_no so_date 
12347 - 0 0 S1230 10/20/2009 
12346 5 0 S1231 8/1/2009 
12345 A 10 15 S1234 12/14/2009 

So the trick is in understanding why that query works, when just using MAX(so_date) in the principal query doesn't. Obviously if I use MAX(so_date) on the principal query it does nothing because there is only the single virtual table output by the query. We need to wrap the virtual table upon iteself so that we can create a relationship that will give us a specific result sets based on the part number/rev relationship and not just the highest (max) value for all the dates in the sales orders table. I believe this will be easier if we consider how the Query Engine processes the query.

Order Line of Code 
5WHERE so_date = (
4SELECT MAX(so_date)
3WHERE B.part_no = A.part_no AND B.Rev = A.Rev )

Tracking this process we see that first all the data is brought together in a virtual table (1), which we will be referenced as A. Then another, new virtal table is created. This is called B (2) where A and B have the same part number and rev (3) and then the Query Engine retrives those items that have the highest date (4) from B. Next the Query engine pulls the result sets from A that have the same value as in step 4 (5). Finally the required columns are selected from the virtual table in step 5, which gives us our result (6).

Let's consider a related idea that is easier to visualize mentally because it's done via a JOIN operation. The query below is taken from SQL Server 2008 Transact-SQL Recipes by Joseph Sack, which is an amazing book. It also uses table aliases to wrap a single table onto itself and then extract information out. While these examples are different in a practical way I think they illustrate a similar approach and it was Joseph Sack's example I was trying to use as my spring board for my own answer when I found Karthik's query.

USE AdventureWorks2008

SELECT s.BusinessEntityID,
   SUM(s2004.SalesQuota) Total_2004_SQ,
   SUM(s2003.SalesQuota) Total_2003_SQ

FROM Sales.SalesPerson s
LEFT OUTER JOIN Sales.SalesPersonQuotaHistory s2004 ON s.BusinessEntityID = s2004.BusinessEntityID
AND YEAR(s2004.QuotaDate)= 2004

LEFT OUTER JOIN Sales.SalesPersonQuotaHistory s2003 ON
s.BusinessEntityID = s2003.BusinessEntityID
AND YEAR(s2003.QuotaDate)= 2003

GROUP BY s.BusinessEntityID


Post a Comment