Databases Reference
In-Depth Information
Now, normal form is just another term for canonical form. So when we talk about normal forms in the
database context, we're talking about a canonical representation for data. To spell the point out: Any given
collection of data can be represented relationally in many different ways, as we know. Of course, all of those ways
are—in fact, must be—information equivalent; that is, information equivalence is the kind of equivalence we appeal
to in this particular context. However, some of those ways (of representing the given information) are preferred
over others for various reasons. And those preferred ways are, of course, the relational normal forms that are the
subject of much of this topic.
As for the term normalization , it simply refers to the general process of mapping some given object into its
canonical equivalent. In the database context in particular, therefore, it's used (as we know) to refer to the process
of mapping some given relvar into a collection of relvars that (a) when considered together, are information
equivalent to the original relvar, but (b) are each individually in some preferred normal form.
To the foregoing I should perhaps add the following. As far as I know, Codd himself never mentioned, in his
early writings on the subject, his reasons for introducing the terminology of normal forms or normalization. But
many years afterward, he did go on record with his own explanation: 13
Interviewer: Where did “normalization” come from?
Codd: It seemed to me essential that some discipline be introduced into database design. I called it normalization
because then President Nixon was talking a lot about normalizing relations with China. I figured that if he could
normalize relations, so could I.
EXERCISES
3.1 Consider the STP example from the section “The Place of Design Theory” in Chapter 1. Give examples of
the update anomalies that can arise with that example. Also give an appropriate decomposition, and show how that
decomposition avoids those anomalies.
3.2 Nonloss decomposition is based on the idea that a relation can be decomposed into projections in such a way
that the original relation can be recovered by joining those projections back together again. In fact, if projections r1
and r2 of relation r are such that every attribute of r is retained in at least one of r1 and r2 , then joining r1 and r2
will always produce every tuple of r . Prove this assertion. (It follows from this fact that the problem with a
decomposition that's not nonloss isn't that the join loses tuples—rather, it's that it produces additional, or
“spurious,” tuples. Since we have no way in general of knowing which if any of the tuples in the join are spurious
and which are genuine, the decomposition has lost information.)
3.3 “Normalization serves two purposes.” Explain this remark in your own words. Do you think the point is
widely understood?
13 In “A Fireside Chat: Interview with Dr. Edgar F. Codd” ( DBMS Magazine 6 , No. 13, December 1993).
Search WWH ::




Custom Search