Deleting Duplicates using CTE and ROW_NUMBER Ranking Function

This is just a simple demonstration of how you can delete duplicates using CTE (Common Table Expressions) and the ROW_NUMBER() ranking function.

Assume you have the following table called SampleTable:

table with duplicates

 

One approach to deleting a duplicate is to first assign sequential numbers that get reset for each group of unique records. We can do this by using the ROW_NUMBER() function and specifying a PARTITION BY clause. The columns in the PARTITION BY should identify the unique records in your table. In the example below, it is enough to specify StudentID.

; with cte AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY StudentID ORDER BY StudentID) RowNum,
*
FROM SampleTable
)

If we were to select from this CTE, this is what you will get. Notice RowNum = 1 is repeated for every new instance of StudentID.

cte

To delete duplicates, you simply need to delete from the CTE where RowNum is greater than 1:

DELETE FROM cte
WHERE RowNum > 1

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s