Database Reference
In-Depth Information
When I turned 12, I received a trunk full of baseball cards as a birthday present from my
parents. I was delighted, not just because there might have been a Hank Aaron or Pete Rose
buried somewhere in that trunk, but because I loved to organize the cards. I categorized
each card according to year and team. Organizing the cards in this way gave me a deep
understanding of the players and their teams. To this day, I can answer many baseball trivia
questions.
Normalization, in general, is the process of applying a set of rules with the goal of organ-
izing
something
. I was normalizing the baseball cards according to year and team. We can
also apply a set of rules and normalize the attributes within our organizations. The rules are
based on how the business works, and the relational data model captures the business rules,
which is why normalization is the primary technique used in building the relational logical
data model.
Just as those baseball cards initially lay unsorted in that trunk, our companies have huge
numbers of attributes spread throughout departments and applications. The rules applied
to normalizing the baseball cards entailed first sorting by year and then by team within
a year. The rules for normalizing our attributes can be boiled down to a single sentence:
Make sure every attribute is
single-valued
and
provides a fact completely
and
only
about
its primary key.
Single-valued
means an attribute must contain only one piece of inform-
ation. If
Consumer Name
contains
Consumer First Name
and
Consumer Last Name
,
for example, we must split
Consumer Name
into two attributes -
Consumer First Name
and
Consumer Last Name
.
Provides a fact
means that a given primary key value will al-
ways return no more than one of every attribute that is in the same entity with this key. If
a Customer Identifier value of
123
, for example, returns three customer last names (
Smith
,
Jones
, and
Roberts
),
Customer Last Name
violates this part of the normalization defini-
tion.
Completely
means that the minimal set of attributes that uniquely identify an instance
of the entity is present in the primary key.
Only
means that each attribute must provide a
fact about the primary key and nothing else. That is, there can be no hidden dependencies,
so we would need to remove derived attributes.
We cannot determine if every attribute is single-valued and provides a fact completely and
only about its primary key unless we understand the data. To understand the data, we usu-
ally need to ask lots of questions. Therefore a second definition for normalization is:
A
formal process of asking business questions.
Even for an apparently simple attribute such
as Phone Number
,
for example, we can ask many questions:
•
Whose phone number is this?