Database Reference
In-Depth Information
be dropped automatically. Because we cannot update the same table for which we're
checking for duplicates, we can note them in this temporary table. We'll use
INSERT...SELECT
to do this:
INSERT INTO
possible_duplicates
SELECT
name_first
,
name_last
FROM
(
SELECT
name_first
,
name_last
,
COUNT
(*)
AS
nbr_entries
FROM
humans
GROUP BY
name_first
,
name_last
)
AS
derived_table
WHERE
nbr_entries
>
1
;
This statement uses a subquery that selects the names and counts the number of entries
based on the
GROUP BY
clause. We saw how touse
GROUP BY
and
COUNT()
together
in
Counting and Grouping Results
, but their use here calls for a reiteration of how they
work. The subquery selects
name_first
and
name_last
, and groups them so that
any rows containing the same first and last names will be grouped together. They can then
be counted. We give the result of
COUNT(*)
an alias of
nbr_entries
so that we can
reference it elsewhere.
Back in the main SQL statement, the
WHERE
clause selects only rows from the subquery
in which there are more than one entry (i.e.,
nbr_entries
is greater than 1). These are
duplicate entries. This SQL statement will insert a row into the temporary table for rows
found in the
humans
table that have the same first and last name. It should enter only one
row in the temporary table for each person.
Now that we have a list of possible duplicates in the temporary table, let's update the
hu-
mans
table to note them:
UPDATE
humans
,
possible_duplicates
SET
possible_duplicate
=
1
WHERE
name_first
=
name_1
AND
name_last
=
name_2
;
That will set the value of the
possible_duplicate
column to 1 where the names in
the
humans
table match the names in
possible_duplicates
. When we're ready,
we can send an email to these members telling them that we have two entries for their
names and asking if the entries are duplicates. If they are, we might be able to merge the
information together (such as by creating another column for a second email address) and
delete the duplicate rows. As for the temporary table, it will be deleted when we close the
MySQL client.