Database Reference
In-Depth Information
The statement includes a
HAVING
clause that restricts the output to include only those
names that occur more than once. In general, to identify sets of values that are dupliā
cated, do the following:
1. Determine which columns contain the values that may be duplicated.
2. List those columns in the column selection list, along with
COUNT(*)
.
3. List the columns in the
GROUP
BY
clause as well.
4. Add a
HAVING
clause that eliminates unique values by requiring group counts to be
greater than one.
Queries constructed that way have the following form:
SELECT
COUNT
(
*
),
column_list
FROM
tbl_name
GROUP
BY
column_list
HAVING
COUNT
(
*
)
>
1
It's easy to generate duplicate-finding queries like that within a program, given database
and table names and a nonempty set of column names. For example, here is a Perl
function
make_dup_count_query()
that generates the proper query for finding and
counting duplicated values in the specified columns:
sub
make_dup_count_query
{
my
(
$db_name
,
$tbl_name
,
@col_name
)
=
@_
;
return
"SELECT COUNT(*),"
.
join
(
","
,
@col_name
)
.
"\nFROM $db_name.$tbl_name"
.
"\nGROUP BY "
.
join
(
","
,
@col_name
)
.
"\nHAVING COUNT(*) > 1"
;
}
make_dup_count_query()
returns the query as a string. If you invoke it like this:
$str
=
make_dup_count_query
(
"cookbook"
,
"catalog_list"
,
"last_name"
,
"first_name"
);
the resulting value of
$str
is:
SELECT
COUNT
(
*
),
last_name
,
first_name
FROM
cookbook
.
catalog_list
GROUP
BY
last_name
,
first_name
HAVING
COUNT
(
*
)
>
1
What you do with the query string is up to you. You can execute it from within the script
that creates it, pass it to another program, or write it to a file for execution later. The
dups
directory of the
recipes
distribution contains a script named
dup_count.pl
that
you can use to try the function (as well as some translations into other languages).