SQL: Eliminate duplicates in table

Spider · 16 Jan 2009 at 09:01

I have a large table containing some duplicate rows (but each row has a unique ID) where all columns are identical between duplicates but for the unique ID. I want to filter out the duplicates (not just the duplicate rows, but the both rows when they are duplicated).

So, for example if I had:

ID Name Address TelNo DOB height weight
2 Joe 123 123 01-Jan-2000 82 80
3 Mary 666 666 21-Feb-1982 66 60
4 Mary 666 666 21-Feb-1982 66 60
5 Bob 010 010 05-May-1974 74 70

I want to eliminate both of the center rows, leaving only the non-duplicates

I’ve been able to group by the common fields which is selecting the data

Mary 666 666 21-Feb-1982 66 60

With a count of two, but how then can I get (and isolate) their ID’s to filter them out? I'm guessing I need some form of inner join on the same table, but can’t fathom it out, despite it being relatively simple (Its Friday!)

Thanks for any tips

suarve · 16 Jan 2009 at 09:12

You could use a select statement to copy all the unique records to a temp table. You could then delete the old table and rename the temp table. So something along the lines of:

Code:

CREATE TABLE tbltemp SELECT name, address, telno, dob, height, weight
   FROM tblpeople;
GROUP BY (name, address);
DROP TABLE tblPeople;
ALTER TABLE temp RENAME TO tblpeople;

Spider · 16 Jan 2009 at 09:30

I cannot do that, unfortunately. This needs to all take place within a single select statement as It is being used as part of a view to transform the data

SimonCHere · 16 Jan 2009 at 09:32

If you don't want to mess around with renaming tables, something like:

Code:

delete largetable
from largetable orig
inner join (select Name, Address, TelNo, DOB, height, weight
              from largetable
              group by Name, Address, TelNo, DOB, height, weight
              having count(id) > 1) dup
on orig.name = dup.name
and orig.Address = dup.Address
and orig.TelNo = dup.TelNo
and orig.DOB = dup.DOB
and orig.height = dup.height
and orig.weight = dup.weight

philjohn · 16 Jan 2009 at 13:37

Using a subquery as the source of a WHERE IN clause may be faster if you have lots of data in the table.

SimonCHere · 16 Jan 2009 at 13:47

philjohn said:
Using a subquery as the source of a WHERE IN clause may be faster if you have lots of data in the table.

Maybe, but how do you get the subselect to return the unique id's for the duplicate rows to delete?

Friday afternoon brain fade :confused:

, so would be interested in your solution.

Simon