Database Reference
In-Depth Information
• When you create a table, include a primary key or unique index to prevent dupli‐
cates from being added to the table. MySQL uses the index as a constraint to enforce
the requirement that each row in the table contains a unique key in the indexed
column or columns.
• In conjunction with a unique index, the
INSERT
IGNORE
and
REPLACE
statements
enable you to handle insertion of duplicate rows gracefully without generating er‐
rors. For bulk-loading operations, the same options are available in the form of the
IGNORE
or
REPLACE
modifiers for the
LOAD
DATA
statement.
• To determine whether a table contains duplicates, use
GROUP
BY
to categorize rows
into groups, and
COUNT()
to see how many rows are in each group.
Chapter 8
describes these techniques in the context of producing summaries, but they're useful
for duplicate counting and identification as well. A counting summary groups val‐
ues into categories to determine how frequently each one occurs.
•
SELECT
DISTINCT
removes duplicate rows from a result set (see
Recipe 3.4
for more
information). For an existing table that already contains duplicates, you can select
unique rows into a second table and use it to replace the original table. Or, if you
determine that there are
n
identical rows in a table, you can use
DELETE
…
LIMIT
to eliminate
n
-1 instances from that specific set of rows.
Scripts related to the examples shown in this chapter are located in the
dups
directory
of the
recipes
distribution. For scripts that create the tables used here, look in the
tables
directory.
16.1. Preventing Duplicates from Occurring in a Table
Problem
You want to prevent a table from ever containing duplicates.
Solution
Use a
PRIMARY
KEY
or a
UNIQUE
index.
Discussion
To ensure that rows in a table are unique, some column or combination of columns
must be required to contain unique values in each row. When this requirement is sat‐
isfied, you can refer to any row in the table unambiguously by using its unique identifier.
To make sure a table has this characteristic, include a
PRIMARY
KEY
or
UNIQUE
index in
the table structure. The following table contains no such index, so it permits duplicate
rows: