Databases Reference

In-Depth Information

Now,
normal form
is just another term for canonical form. So when we talk about normal forms in the

database context, we're talking about a canonical representation for data. To spell the point out: Any given

collection of data can be represented relationally in many different ways, as we know. Of course, all of those ways

are—in fact, must be—information equivalent; that is, information equivalence is the kind of equivalence we appeal

to in this particular context. However, some of those ways (of representing the given information) are preferred

over others for various reasons. And those preferred ways are, of course, the relational normal forms that are the

subject of much of this topic.

As for the term
normalization
, it simply refers to the general process of mapping some given object into its

canonical equivalent. In the database context in particular, therefore, it's used (as we know) to refer to the process

of mapping some given relvar into a collection of relvars that (a) when considered together, are information

equivalent to the original relvar, but (b) are each individually in some preferred normal form.

To the foregoing I should perhaps add the following. As far as I know, Codd himself never mentioned, in his

early writings on the subject, his reasons for introducing the terminology of normal forms or normalization. But

many years afterward, he did go on record with his own explanation:
13

Interviewer:
Where did “normalization” come from?

Codd:
It seemed to me essential that some discipline be introduced into database design. I called it normalization

because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could

normalize relations, so could I.

EXERCISES

3.1 Consider the STP example from the section “The Place of Design Theory” in Chapter 1. Give examples of

the update anomalies that can arise with that example. Also give an appropriate decomposition, and show how that

decomposition avoids those anomalies.

3.2 Nonloss decomposition is based on the idea that a relation can be decomposed into projections in such a way

that the original relation can be recovered by joining those projections back together again. In fact, if projections
r1

and
r2
of relation
r
are such that every attribute of
r
is retained in at least one of
r1
and
r2
, then joining
r1
and
r2

will always produce every tuple of
r
. Prove this assertion. (It follows from this fact that the problem with a

decomposition that's not nonloss isn't that the join loses tuples—rather, it's that it produces additional, or

“spurious,” tuples. Since we have no way in general of knowing which if any of the tuples in the join are spurious

and which are genuine, the decomposition has lost information.)

3.3 “Normalization serves two purposes.” Explain this remark in your own words. Do you think the point is

widely understood?

13
In “A Fireside Chat: Interview with Dr. Edgar F. Codd” (
DBMS Magazine 6
, No. 13, December 1993).