Database Reference
In-Depth Information
Before you delete duplicate data, remember that sometimes it isn't the data that is wrong, it
is your understanding of it. In those cases, it may be that you haven't properly normalized your
database model, and that you need to include additional tables to account for the shape of
the data. You might also find that duplicate rows are caused by having decided to exclude a
column somewhere earlier in a data load process. Check twice, delete once.
How to do it...
First, identify the duplicates using a query, such as the following:
SELECT *
FROM cust
WHERE customerid IN
(SELECT customerid
FROM cust
GROUP BY customerid
HAVING count(*) > 1);
The results can be used to identify the bad data manually, and resolve the problem by carrying
out the following steps:
F Merge the two rows to give the best picture of the data, if required. This might use
values from one row to update the row you decide to keep, such as:
UPDATE cust
SET age = 47
WHERE customerid = 4
AND lastname = 'Palmer';
F DELETE the remaining undesirable rows:
DELETE FROM cust
WHERE customerid = 4
AND lastname = 'Hall';
In some cases, the data rows might be completely identical, as in the table new_cust ,
which looks like the following:
postgres=# SELECT * FROM new_cust;
customerid
------------
1
2
3
4
4
(5 rows)
 
Search WWH ::




Custom Search