Flagging Duplicate Data in Same Table With Common Table Expression and Correlation Update

Question

My methodical madness is not working...I am missing something. For the first time, I am tasked with cleaning up dupllicates in thesame table. I have googled and found many things like deleting with common table expression, etc...but nothing really that I can use.

My address table is such:

Address
--------
id
add1
add2
city 
state
zip
parentidofthisdup

I want to get duplicates and the row number. I consider the the id of row number 1 the parent. to ny susequent rows of dup addresses that I pull back, I would like to stamp those with the id of the parent in arentidofthisdup. I will end up keeping the parent and disposing of the ones that have the parent id in parentidofthisdup.

I am trying to make this update by doing a Common Table Expression and then using the cte in a correlation update, but yikes, tis not working. All I get is that all records were update but only null vals result in parentidofthisdup.

Maybe I am not coding this in the right way. I'm fairly new at mass updates.

-- My common table expression of the set that I want stamped
with tbFlagTheseWithPk as
(
Select * from 

(
 select  
    myaddress.id,
    myaddress.parentidofthisdup,              
    myaddress.add1, 
    myaddress.add2,
    myaddress.state,
    myaddress.zip,
    row_number() over (partition by add1, state, zip order by add1, state, zip, add2) as [rn]   
    from myaddress
  where     add1 !=''
) as a
where a.rn > 1)

-- Now use our Common Table Expression using a correlated subquery to make them children of rn 1

Update tbFlagTheseWithPk
set 
set parentidofthisdup = 
(   Select id from                                                                                                  
     (Select * from  
    (   select      myaddress.pkey,                                                                          myaddress.parentidofthisdup,                                                                   myaddress.add1,                                                                                 myaddress.add2,
myaddress.state,
myaddress.zip,
row_number() over (partition by add1, state, zip order by a1, state, zip, add2) as [rn]
from myaddress where add1 !=''   
    ) as a                                                                                                  
    where a.rn > 1) as  b  

    where   b.a1 = tbFlagTheseWithPk.add1                                                                                
    and 
 b.state = tbFlagTheseWithPk.state
 and
 b.zip = tbFlagTheseWithPk.zip

    and 
 tbFlagTheseWithPk.rn = 1

Isn't there a nicer way of doing this? How do I get over this mass update learning curve? I feel that I should be able to do this in one elegant way, but if I don't figure this out soon, I am going to resort to looping over a cursor and turning a blind eye to the beauty of SQL...but that would be a tragedy.

score 1 · Accepted Answer · edited May 23 '17 at 12:04

Never use a cursor.

You are on the right tracks. These links might help SQL Server - inner join when updating , Row_Number http://msdn.microsoft.com/en-us/library/ms186734.aspx , CTE http://msdn.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

DECLARE  @myAddress table
(id int, parentidofthisdup int, add1 nvarchar(10),add2 nvarchar(10) , [state] nvarchar(10),zip nvarchar(10) ) ;


Insert into @myAddress Values(1,null,'a','b','c','d');
Insert into @myAddress Values(2,null,'a','b','c','d');
Insert into @myAddress Values(3,null,'a','b','c','d');
Insert into @myAddress Values(5,null,'a','b','c','d');
Insert into @myAddress Values(6,null,'a','f','c','d');
Insert into @myAddress Values(7,null,'a','b','g','d');
Insert into @myAddress Values(8,null,'a','f','c','d');
with cte AS 
(
select  
    myaddress.id,
    myaddress.parentidofthisdup,              
    myaddress.add1, 
    myaddress.add2,
    myaddress.state,
    myaddress.zip,
    row_number() over (partition by add1, add2, state, zip order by id,add1, [state], zip, add2) as [rn]   
    from @myaddress myaddress


)


update r SET parentidOfthisDup
    = cte.id
    From cte Inner join @myAddress r
    ON cte.add1 = r.add1
          AND cte.add2 =r.add2
          AND cte.Zip =r.zip
          AND  cte.[state] =r.[state]
          and cte.id<>r.id
    WHERE cte.rn = 1 


select * from @myAddress

Flagging Duplicate Data in Same Table With Common Table Expression and Correlation Update

1 Answers1