Getting maximum consecutive years in T-SQL using Common Table Expressions (CTE)

This is an expansion (and slight variation) of Joe’s answer in StackOverflow regarding the thread Find the maximum consecutive years for each ID’s in a table (Oracle SQL) using SQL Server. I’ve tested this in SQL Server 2014, but should work from SQL Server 2005 onwards where the ranking functions are supported.

The problem Joe solved is not uncommon – i.e. how to get the consecutive years of something (consecutive year sales, consecutive years in school, consecutive years volunteering etc), and his solution is pretty clever.

Let’s assume you are working on a sales database. You have multiple clients who could be purchasing from your store several times a year, or a couple of times every five years. You may want to know what’s the maximum consecutive years they’ve purchased from you. Why? Perhaps in a marketing campaign, you may want to give your loyal customers (purchased in 5 or more consecutive years) a special discount.

Your data may look like the following screenshot. Notice that in this example, Client 00001, 00002 and 00003 purchased only once. Client 00004 purchased several times, and it looks like there were purchases in consecutive years.

Download sample T-SQL script to create and populate the SAMPLESALES table

01 All Revenue

Let’s take it step by step to understand both the problem and solution better. It will be easy to visually identify which clients have purchased consecutive years if we first display all the unique years that client has purchased:

SELECT 
    DISTINCT
    CLIENTID,
    CLIENTNAME,
    YEAR(REVENUEDATE) REVENUEYEAR
FROM
    SAMPLESALES

Here’s the result:

Client 00004

Here we can see Client 00004 purchased in 2004 and 2005 (2 consecutive years), but did not purchase in 2006. Starting 2007, Client 0004 started purchasing again every year until 2012 (6 consecutive years).

First thing we can do to try and understand how we can solve this issue is to do a self join to see what’s the difference between the years in the records. What will also help us is getting how many unique years are in the table between the two years we are looking at, which we accomplish below using a subquery. Note that the CTE (Common Table Expression) that I am using already provides just distinct years per client.

SELECT
    c1.CLIENTID,
    c1.CLIENTNAME,
    c1.REVENUEYEAR YEAR1,
    c2.REVENUEYEAR YEAR2,
      -- year spread between the records
    c2.REVENUEYEAR - c1.REVENUEYEAR YEARDIFF,
      -- how many records are in our table that is 
      -- within this date range
    (
        SELECT 
            COUNT(*) 
        FROM 
            revenueyears_cte c3
        WHERE 
            (c3.CLIENTID = c1.CLIENTID)  
            AND (c3.REVENUEYEAR BETWEEN c1.REVENUEYEAR AND c2.REVENUEYEAR) )
  NUMRECORDSINBETWEEN
FROM
    revenueyears_cte c1
    LEFT JOIN revenueyears_cte c2
    ON c1.CLIENTID = c2.CLIENTID
    AND c1.REVENUEYEAR < c2.REVENUEYEAR 

Using Client 0004, I’ve highlighted a couple of records.

04 Year Difference and Spread

The first one in Row 16 where year1 = 2005 and year 2 = 2012 gives us a YEARDIFF of 7 (because 2012 minus 2005 is 7), and the number of records in between is 7 as well. In reality, if we had records for this client for each year from 2005 to 2012, we should have 8 records not 7.

This means we are looking for the record that has the highest NUMRECORDSINBETWEEN but also where NUMRECORDSINBETWEEN is higher than YEARDIFF. So, although Row 16 gives us a higher NUMRECORDSINBETWEEN, it is in fact Row 21 that gives us the maximum number of consecutive years.

Once we know that, we can simply pull the records where NUMRECORDSINBETWEEN is less than the spread. In my query I also want to retain clients that have purchased only for a single year, which are records where NUMRECORDSINBETWEEN is 0. In the query I want to re-assign these clients a 1 for CONSECUTIVEYEARS:

     SELECT 
        CLIENTID, 
        CLIENTNAME, 
        CASE 
           WHEN NUMRECORDSINBETWEEN = 0 THEN 1 
           ELSE NUMRECORDSINBETWEEN 
        END AS CONSECUTIVEYEARS
     FROM 
        revenueyears_cte2 
     WHERE 
        (NUMRECORDSINBETWEEN > YEARDIFF)
         OR (NUMRECORDSINBETWEEN = 0)

All the prep work is done. Now we can simply use the MAX() aggregate function to get the record with the most consecutive years.

If you want, we can even throw in a subquery with FOR XML PATH to pull all the years, so we can visually check. (FOR XML is great at concatenating all these strings, but it’s not without it’s cons. Remember this is a subquery that will go through all your records, so please test before you use in your environments especially if you have large data sets).

05 Max Consecutive Years and Year List

We can check that:

  • Client 00004 purchased from 2004-2005 which is 2 consecutive years, missed 2006, and then again from 2007-2012. Consecutive years is correctly counted from 2007-2012 which is 6.
  • Client 00007 purchased 2009 and 2010, and consecutive years is correctly counted at 2
  • Client 00008 purchased 2002 and 2004, which are not consecutivey, and consecutive years is correctly counted at 1

Here is the full script, which utilizes CTEs (Common Table Expressions) to simplify the query and make it more readable:

;
WITH    
-- clients with distinct years
revenueyears_cte
  AS (
      SELECT DISTINCT
        CLIENTID,
        CLIENTNAME,
        YEAR(REVENUEDATE) REVENUEYEAR
      FROM
        SAMPLESALES2
     ),
-- get the year spread and number of records in between
revenueyears_cte2
  AS (
      SELECT
        c1.CLIENTID,
        c1.CLIENTNAME,
        c1.REVENUEYEAR YEAR1,
        c2.REVENUEYEAR YEAR2,
        c2.REVENUEYEAR - c1.REVENUEYEAR YEARDIFF,
        (
           SELECT 
              COUNT(*) 
           FROM 
              revenueyears_cte c3
           WHERE (c3.CLIENTID = c1.CLIENTID) 
              AND (c3.REVENUEYEAR BETWEEN c1.REVENUEYEAR AND c2.REVENUEYEAR) 
         ) NUMRECORDSINBETWEEN      
      FROM
        revenueyears_cte c1
        LEFT JOIN revenueyears_cte c2
        ON c1.CLIENTID = c2.CLIENTID
        AND c1.REVENUEYEAR < c2.REVENUEYEAR 
     ) ,
 revenueyears_cte3 
 AS
   (
     SELECT 
        CLIENTID, 
        CLIENTNAME, 
        CASE 
           WHEN NUMRECORDSINBETWEEN = 0 THEN 1 
           ELSE NUMRECORDSINBETWEEN 
        END AS CONSECUTIVEYEARS
     FROM 
        revenueyears_cte2 
     WHERE 
        (NUMRECORDSINBETWEEN > YEARDIFF)
         OR (NUMRECORDSINBETWEEN = 0)
   )

SELECT
    CLIENTID,
    CLIENTNAME, 
    MAX(CONSECUTIVEYEARS) MAX_CONSECUTIVEYEARS
    ,
    STUFF((
           SELECT DISTINCT
            ', ' + CAST(REVENUEYEAR AS VARCHAR(4))
           FROM
            revenueyears_cte t
           WHERE
            t.CLIENTID = cte3.CLIENTID
           ORDER BY
            ', ' + CAST(REVENUEYEAR AS VARCHAR(4))
          FOR
           XML PATH('')
          ), 1, 2, '') YEARSGIVING
FROM
    revenueyears_cte3 cte3
GROUP BY
    CLIENTID,
    CLIENTNAME
ORDER BY
    CLIENTNAME

You can vary this solution a bit more, by utilizing ranking functions similar to Joe’s original solution.

That’s it. Hope this helped explain Joe’s elegant solution in StackOverflow regarding the thread Find the maximum consecutive years for each ID’s in a table (Oracle SQL) .

Advertisements

5 comments

  • I believe the query can be written with more performance when executed

    WITH cteSource
    AS (
    SELECT ClientID,
    ClientName,
    MAX(COUNT(*)) OVER (PARTITION BY ClientName) AS MaxConsecutiveYears
    FROM (
    SELECT ClientID,
    ClientName,
    DATEPART(YEAR, RevenueDate) – ROW_NUMBER() OVER (PARTITION BY ClientID ORDER BY DATEPART(YEAR, RevenueDate)) AS theGrp
    FROM dbo.SampleSales
    GROUP BY ClientID,
    ClientName,
    DATEPART(YEAR, RevenueDate)
    ) AS d
    GROUP BY ClientID,
    ClientName,
    theGrp
    )
    SELECT s.ClientID,
    s.ClientName,
    s.MaxConsecutiveYears,
    MAX(STUFF(f.Data, 1, 2, ”)) AS YearsGiving
    FROM cteSource AS s
    CROSS APPLY (
    SELECT ‘, ‘ + DATENAME(YEAR, x.RevenueDate)
    FROM dbo.SampleSales AS x
    WHERE x.ClientID = s.ClientID
    GROUP BY DATENAME(YEAR, x.RevenueDate)
    ORDER BY DATENAME(YEAR, x.RevenueDate)
    FOR XML PATH(”)
    ) AS f(Data)
    GROUP BY s.ClientID,
    s.ClientName,
    s.MaxConsecutiveYears
    ORDER BY s.ClientName;

    Like

  • Pingback: BPOTW 2015-08-21 | SQL Notes From The Underground

  • Stripping it down to the minimal data, how would this work for you?

    CREATE TABLE Test
    (client_name CHAR(5) NOT NULL,
    something_year INTEGER NOT NULL,
    PRIMARY KEY(client_name, something_year);

    WITH X
    AS
    (SELECT client_name, something_year,
    LAG(something_year) OVER (PARTITION BY client_name
    BETWEEN ROWS 5 PRECEDING AND CURRENT ROW)
    AS prior_something_year
    FROM TEST)

    SELECT X.client_name, X.something_year,
    FROM X
    WHERE X.something_year – 4 = X.prior_something_year;

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s