Database Reference
In-Depth Information
CHAPTER 16
Handling Duplicates
16.0. Introduction
Tables or result sets sometimes contain duplicate rows. In some cases this is acceptable.
For example, if you conduct a web poll that records date and client IP number along
with the votes, duplicate rows may be permitted because it's possible for large numbers
of votes to appear to originate from the same IP number for an Internet service that
routes traffic from its customers through a single proxy host. In other cases, duplicates
are unacceptable, and you'll want to take steps to avoid them. Operations involved in
handling duplicate rows include the following:
Preventing duplicates from being created in the first place. If each row in a table is
intended to represent a single entity (such as a person, an item in a catalog, or a
specific observation in an experiment), the occurrence of duplicates presents sig‐
nificant difficulties in using it that way. Duplicates make it impossible to refer to
each row unambiguously, so it's best to make sure duplicates never occur.
Counting the number of duplicates to determine whether they are present and to
what extent.
• Identifying duplicated values (or the rows containing them) so you can see where
they occur.
• Eliminating duplicates to ensure that each row is unique. This may involve remov‐
ing rows from a table to leave only unique rows or selecting a result set in such a
way that no duplicates appear in the output. For example, to display a list of the
states in which you have customers, you probably don't want a long list of state
names from all customer records. A list showing each state name only once suffices
and is easier to understand.
Several tools are at your disposal for dealing with duplicate rows. Choose them accord‐
ing to the objective that you want to achieve:
 
Search WWH ::




Custom Search