Database Reference
In-Depth Information
Removing duplicates using table replacement
If a row is considered to duplicate another only if the entire row is the same, one way
to eliminate duplicates from a table is to select its unique rows into a new table that has
the same structure, and then replace the original table with the new one:
1. Create a new table that has the same structure as the original one. CREATE TABLE
LIKE is useful for this (see Recipe 4.1 ):
mysql> CREATE TABLE tmp LIKE catalog_list;
2. Use INSERT INTO SELECT DISTINCT to select the unique rows from the original
table into the new one:
mysql> INSERT INTO tmp SELECT DISTINCT * FROM catalog_list;
Select rows from the tmp table to verify that the new table contains no duplicates:
mysql> SELECT * FROM tmp ORDER BY last_name, first_name;
+-----------+-------------+--------------------------+
| last_name | first_name | street |
+-----------+-------------+--------------------------+
| Baxter | Wallace | 57 3rd Ave. |
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
| Brown | Bartholomew | 432 River Run |
| Isaacson | Jim | 515 Fordam St., Apt. 917 |
| McTavish | Taylor | 432 River Run |
| Pinter | Marlene | 9 Sunset Trail |
+-----------+-------------+--------------------------+
3. After creating the new tmp table that contains unique rows, use it to replace the
original catalog_list table:
mysql> DROP TABLE catalog_list;
mysql> RENAME TABLE tmp TO catalog_list;
The effective result of this procedure is that catalog_list no longer contains duplicates.
This table-replacement method works in the absence of an index (although it might be
slow for large tables). For tables that contain duplicate NULL values, it removes those
duplicates. It does not prevent the occurrence of duplicates in the future.
This method requires rows to be completely identical to be considered duplicates. Thus,
it treats as distinct those rows for Wallace Baxter that have slightly different street
values.
If duplicates are defined only with respect to a subset of the columns in the table, create
a new table that has a unique index for those columns, select rows into it using IN
SERT IGNORE , and replace the original table with the new one:
mysql> CREATE TABLE tmp LIKE catalog_list;
mysql> ALTER TABLE tmp ADD PRIMARY KEY (last_name, first_name);
mysql> INSERT IGNORE INTO tmp SELECT * FROM catalog_list;
Search WWH ::




Custom Search