Handling Duplicates - MySQL

Database Reference

In-Depth Information

| Baxter | Wallace | 57 3rd Ave., Apt 102 |

+-----------+-------------+--------------------------+

This technique works in the absence of a unique index, and it eliminates duplicate NULL

values. It's handy for removing duplicates only for a specific set of rows within a table.

However, if there are many different sets of duplicates to remove, this is not a procedure

you'd want to carry out by hand. The process can be automated by using the techniques

discussed earlier in Recipe 16.3 for determining which values are duplicated. There, we

wrote a make_dup_count_query() function to generate the statement needed to count

the number of duplicate values in a given set of columns in a table. The result of that

statement can be used to generate a set of DELETE … LIMIT n statements that remove

duplicate rows and leave only unique rows. The dups directory of the recipes distri‐

bution contains code that shows how to generate these statements.

In general, using DELETE … LIMIT n is likely to be slower than removing duplicates by

using a second table or by adding a unique index. Those methods keep the data on the

server side and let the server do all the work. DELETE … LIMIT n involves a lot of client-

server interaction because it uses a SELECT statement to retrieve information about

duplicates, followed by several DELETE statements to remove instances of duplicated

rows. Also, this technique does not prevent duplicates from occurring in the future.

MySQL

Search WWH ::

Custom Search

Home