Deleting Duplicate rows the smart way using OVER PARTITION in MS SQL Part II

In my previous blog on this I made it very simple where all the data was a duplicate, someone pinged me and asked well that is well and good. But what if there is duplicate records, with only partial duplicate records.

Well very easy to answer, so lets look at the same data with a few modifications.

insert into Emp_Details (Emp_Name, Company, Join_Date, Resigned_Date)
values ('John', 'Software', '20060201', '20061231')
,('John', 'Software', '20060301', '20061231')
,('John', 'Software', '20060401', '20061231')
,('John', 'SuperSoft', '20070101', '20071231')
,('John', 'UltraSoft', '20070201', '20080131')
,('John', 'ImproSoft', '20080201', '20081231')
,('John', 'ImproSoft', '20080201', '20081231')
,('Mary', 'Software', '20060101', '20081231')
,('Mary', 'SuperSoft', '20090101', '20090531')
,('Mary', 'SuperSoft', '20090101', '20090531')
,('Mary', 'UltraSoft', '20090601', '20100531')
,('Mary', 'UltraSoft', '20090601', '20100531')

This time we have modified John's Join Date to be a month apart of each other.

And if we use the older method we would not get the desired results, so lets look at the query again shall we.

select *, COUNT(*) as Occurrences from Emp_Details
group by Emp_Name, Company, Join_Date, Resigned_Date

And if we run this code we would get the following output.

SqlDeleteTest2

Do you notice how it thinks that the records for John and the company Software, with different join dates, are not duplicate rows?

Well we can fix this with using Over Partition by modifying the query, so that it now looks like this.

select Emp_Name
   ,Company
   ,Join_Date
   ,Resigned_Date
   ,ROW_NUMBER() over (partition by Emp_Name, Company order by Join_Date) RowNumber
from Emp_Details

The changes are minor, and we are now saying that we want to partition on the Emp_Name and Company and that we are going to have it sorted on the Join_Date. We could very easily modify this so that the last join date is occurrence one by modifying the order by to desc, and we can now see that our results are what we would expect to get as well.

SqlDeleteTest3

And to delete these we would just use the same way of deleting, as we did in the previous post. As you can see there is a lot more greater control on how and what you can delete, and it makes our jobs so much easier than once before.