How to update multiple duplicates with different values on the same table?

Question

The table I am dealing with has multiple rows which have the same values for lat and lon. The example shows that 1, 3, 5 have the same location but the name attribute differs. The hash is built from name, lat and lon and differs therefore.

BEFORE:

 id | name | lat | lon | flag | hash
----+------+-----+-----+------+------
 1  | aaa  | 16  | 48  |  0   | 2cd       <-- duplicate
 2  | bbb  | 10  | 22  |  0   | 3fc
 3  | ccc  | 16  | 48  |  0   | 8ba       <-- duplicate
 4  | ddd  | 10  | 23  |  0   | c33
 5  | eee  | 16  | 48  |  0   | 751       <-- duplicate

I need to identify "duplicates" within this table and want to assign the flag 1 (primary) to one of them and the flag 2 (secondary) to the others. It is not important which "duplicate" is flagged as primary.

 AFTER:

 id | name | lat | lon | flag | hash
----+------+-----+-----+------+------
 1  | aaa  | 16  | 48  |  1   | 2cd       <-- updated
 2  | bbb  | 10  | 22  |  0   | 3fc
 3  | ccc  | 16  | 48  |  2   | 8ba       <-- updated
 4  | ddd  | 10  | 23  |  0   | c33
 5  | eee  | 16  | 48  |  2   | 751       <-- updated

I started experimenting with INNER JOIN inspired by this post and this visual description. With this I am able to assign the same flag to all duplicates.

UPDATE table t1
INNER JOIN table_name t2
ON
  t1.lat = t2.lat
  AND t1.lon = t2.lon
  AND t1.hash != t2.hash
SET
  t1.flag = 2;

I also tested LEFT OUTER JOIN with WHERE t2.id IS NULL which could work when there are only two rows. However, I cannot think my head off how a JOIN should work with more then two duplicates. Mark Harrison also assumes "that you're joining on columns with no duplicates" at the beginning of his post which sound as if this is not a good idea.

I am using MySQL if this is of interest.

score 2 · Accepted Answer · answered Aug 09 '13 at 13:24

2

Not sure this is very efficient, but it works in just one query:

UPDATE t
JOIN (
  SELECT MAX(t.id) AS maxid, lat, lon
  FROM t
  JOIN t AS duplicates
  USING (lat, lon)
  GROUP BY lat, lon
  HAVING COUNT(*) > 1
) AS maxima USING (lat, lon)
SET flag = IF(id = maxid, 1, 2);

answered Aug 09 '13 at 13:24

RandomSeed

29,301
6
52
87

Works really fast! Thank you. Though, *SQL Fiddle* does not load for me (neither [the example](http://sqlfiddle.com/#viewSample)). – JJD Aug 10 '13 at 14:04

score 0 · Answer 2 · answered Aug 08 '13 at 23:42

0

Assuming you want all canonical records to have a flag=1 (and not just those records that have duplicates) you can do:

UPDATE table t1
SET t1.flag = 1
WHERE t1.id in (
    SELECT min(id)
    FROM table t2
    WHERE t1.lat = t2.lat
    AND t1.lon = t2.lon
);

UPDATE table
SET flag = 2
WHERE flag is null;

COMMIT;

answered Aug 08 '13 at 23:42

David Marx

8,172
3
45
66

Only duplicates should be updated. The statement you posted throws `Mysql2::Error: You can't specify target table 't1' for update in FROM clause`. – JJD Aug 09 '13 at 07:56

schlock · Answer 3 · 2013-08-09T00:01:47.350

You could do this in two steps:

update table t1 set flag = 2 where id in 
   (select a.id
    from 
        t1 a
        join t1 b on (b.lon = a.lon and b.lat = a.lat));

update table t1 set flag = 1 where id in 
   (select a.id 
    from t1 a
    where a.flag = 2 and a.id = 
        (select min(b.id) from t1 b where b.lon = a.lon and b.lat = a.lat)
   );

This should be used as inspiration and not as gospel (it's probably wrong) :)

How to update multiple duplicates with different values on the same table?

3 Answers3

Linked