Re: Removing duplicates in MySQL

To get a list of duplicate records, you can join the table on itself. Something like this ought to do it:

SELECT a.unique_id 
  FROM my_table as a, my_table as b 
 WHERE a.field1 = b.field1
   and a.field2 = b.field2
   and a.field3 = b.field3
   and a.unique_id != b.unique_id
[download]

This will return a list of all unique_id's where two rows share the same value of field1, field2, and field3. Note that it will return the unique_id of Both matching rows, not just one of them.

I don't have a suitable dataset to experiment on, but you might be able to also do a group by the fields which must match, and select max(unique_id) instead, to grab one row ID from each group of matching stuff. You might still need to use perl to loop through and delete results, but at least you'll then be doing a linear operaton instead of exponential in perl.

Update: changed unique_id to a.unique_id

You can group by a.unique_id to make sure you only get one row per unique id. However, in the case where you know that a large portion of your rows have duplicates, you're right: this still doesn't give you any information you don't already have Slightly better would be select a.unique_id, b.unique_id so you at least know which row each row matches. Still less than ideal, I agree.

Alan

Comment on Re: Removing duplicates in MySQL Select or Download Code

Replies are listed 'Best First'.
Re: Re: Removing duplicates in MySQL by tommyw (Hermit) on May 01, 2002 at 16:32 UTC
So this is going to tell you which unique id's have duplicates (and how many duplicates they have, as that'll be the number of times they're spat out), but not much more than that. A table with one set of three duplicates and one pair is going to return each of the first three id's twice, and each of the pair twice, for a total of ten records output, which then need much cleaning up. Nah, don't like it. -- Tommy Too stupid to live. Too stubborn to die.	[reply]

Replies are listed 'Best First'.

Re: Re: Removing duplicates in MySQL
by tommyw (Hermit) on May 01, 2002 at 16:32 UTC

So this is going to tell you which unique id's have duplicates (and how many duplicates they have, as that'll be the number of times they're spat out), but not much more than that.

A table with one set of three duplicates and one pair is going to return each of the first three id's twice, and each of the pair twice, for a total of ten records output, which then need much cleaning up.

Nah, don't like it.

--
Tommy
Too stupid to live.
Too stubborn to die.

[reply]