Database Reference
In-Depth Information
| Baxter | Wallace | 57 3rd Ave., Apt 102 |
+-----------+-------------+--------------------------+
This technique works in the absence of a unique index, and it eliminates duplicate
NULL
values. It's handy for removing duplicates only for a specific set of rows within a table.
However, if there are many different sets of duplicates to remove, this is not a procedure
you'd want to carry out by hand. The process can be automated by using the techniques
discussed earlier in
Recipe 16.3
for determining which values are duplicated. There, we
wrote a
make_dup_count_query()
function to generate the statement needed to count
the number of duplicate values in a given set of columns in a table. The result of that
statement can be used to generate a set of
DELETE
…
LIMIT
n
statements that remove
duplicate rows and leave only unique rows. The
dups
directory of the
recipes
distri‐
bution contains code that shows how to generate these statements.
In general, using
DELETE
…
LIMIT
n
is likely to be slower than removing duplicates by
using a second table or by adding a unique index. Those methods keep the data on the
server side and let the server do all the work.
DELETE
…
LIMIT
n
involves a lot of client-
server interaction because it uses a
SELECT
statement to retrieve information about
duplicates, followed by several
DELETE
statements to remove instances of duplicated
rows. Also, this technique does not prevent duplicates from occurring in the future.