Database Reference
In-Depth Information
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
+-----------+-------------+--------------------------+
This technique works in the absence of a unique index, and it eliminates duplicate NULL
values. It's handy for removing duplicates only for a specific set of rows within a table.
However, if there are many different sets of duplicates to remove, this is not a procedure
you'd want to carry out by hand. The process can be automated by using the techniques
discussed earlier in Recipe 16.3 for determining which values are duplicated. There, we
wrote a make_dup_count_query() function to generate the statement needed to count
the number of duplicate values in a given set of columns in a table. The result of that
statement can be used to generate a set of DELETE LIMIT n statements that remove
duplicate rows and leave only unique rows. The dups directory of the recipes distri‐
bution contains code that shows how to generate these statements.
In general, using DELETE LIMIT n is likely to be slower than removing duplicates by
using a second table or by adding a unique index. Those methods keep the data on the
server side and let the server do all the work. DELETE LIMIT n involves a lot of client-
server interaction because it uses a SELECT statement to retrieve information about
duplicates, followed by several DELETE statements to remove instances of duplicated
rows. Also, this technique does not prevent duplicates from occurring in the future.
Search WWH ::




Custom Search