SQL: Eliminate duplicates in table

Associate
Joined
28 Nov 2002
Posts
766
Location
Down the road
I have a large table containing some duplicate rows (but each row has a unique ID) where all columns are identical between duplicates but for the unique ID. I want to filter out the duplicates (not just the duplicate rows, but the both rows when they are duplicated).

So, for example if I had:

ID Name Address TelNo DOB height weight
2 Joe 123 123 01-Jan-2000 82 80
3 Mary 666 666 21-Feb-1982 66 60
4 Mary 666 666 21-Feb-1982 66 60
5 Bob 010 010 05-May-1974 74 70

I want to eliminate both of the center rows, leaving only the non-duplicates

I’ve been able to group by the common fields which is selecting the data

Mary 666 666 21-Feb-1982 66 60

With a count of two, but how then can I get (and isolate) their ID’s to filter them out? I'm guessing I need some form of inner join on the same table, but can’t fathom it out, despite it being relatively simple (Its Friday!)

Thanks for any tips
 
Soldato
Joined
8 Oct 2005
Posts
4,185
Location
Midlands, UK
You could use a select statement to copy all the unique records to a temp table. You could then delete the old table and rename the temp table. So something along the lines of:

Code:
CREATE TABLE tbltemp SELECT name, address, telno, dob, height, weight
   FROM tblpeople;
GROUP BY (name, address);
DROP TABLE tblPeople;
ALTER TABLE temp RENAME TO tblpeople;
 
Last edited:
Associate
Joined
24 Jun 2008
Posts
1,168
If you don't want to mess around with renaming tables, something like:

Code:
delete largetable
from largetable orig
inner join (select Name, Address, TelNo, DOB, height, weight
              from largetable
              group by Name, Address, TelNo, DOB, height, weight
              having count(id) > 1) dup
on orig.name = dup.name
and orig.Address = dup.Address
and orig.TelNo = dup.TelNo
and orig.DOB = dup.DOB
and orig.height = dup.height
and orig.weight = dup.weight
 
Associate
Joined
24 Jun 2008
Posts
1,168
Using a subquery as the source of a WHERE IN clause may be faster if you have lots of data in the table.

Maybe, but how do you get the subselect to return the unique id's for the duplicate rows to delete?

Friday afternoon brain fade :confused:, so would be interested in your solution.

Simon
 
Back
Top Bottom