Database Reference
In-Depth Information
The statement includes a HAVING clause that restricts the output to include only those
names that occur more than once. In general, to identify sets of values that are dupliā€
cated, do the following:
1. Determine which columns contain the values that may be duplicated.
2. List those columns in the column selection list, along with COUNT(*) .
3. List the columns in the GROUP BY clause as well.
4. Add a HAVING clause that eliminates unique values by requiring group counts to be
greater than one.
Queries constructed that way have the following form:
SELECT COUNT ( * ), column_list
FROM tbl_name
GROUP BY column_list
HAVING COUNT ( * ) > 1
It's easy to generate duplicate-finding queries like that within a program, given database
and table names and a nonempty set of column names. For example, here is a Perl
function make_dup_count_query() that generates the proper query for finding and
counting duplicated values in the specified columns:
sub make_dup_count_query
{
my ( $db_name , $tbl_name , @col_name ) = @_ ;
return "SELECT COUNT(*)," . join ( "," , @col_name )
. "\nFROM $db_name.$tbl_name"
. "\nGROUP BY " . join ( "," , @col_name )
. "\nHAVING COUNT(*) > 1" ;
}
make_dup_count_query() returns the query as a string. If you invoke it like this:
$str = make_dup_count_query ( "cookbook" , "catalog_list" ,
"last_name" , "first_name" );
the resulting value of $str is:
SELECT COUNT ( * ), last_name , first_name
FROM cookbook . catalog_list
GROUP BY last_name , first_name
HAVING COUNT ( * ) > 1
What you do with the query string is up to you. You can execute it from within the script
that creates it, pass it to another program, or write it to a file for execution later. The
dups directory of the recipes distribution contains a script named dup_count.pl that
you can use to try the function (as well as some translations into other languages).
Search WWH ::




Custom Search