Databases Reference
In-Depth Information
(and vice versa) that produces the same result. Clearly, such “losslessness” of decompositions is an
important property; whatever we do by way of normalization, we certainly mustn't lose any information
when we do it.
It follows from the foregoing that just as projection is the decomposition operator (with respect to
normalization as conventionally understood), so join is the corresponding re composition operator.
So far, so good; this is all very familiar stuff. But now I want to point out that if you've been paying careful
attention, you might reasonably accuse me of practicing a tiny (?) deception ... To be specific, I've considered what
it means for a decomposition of relations to be nonloss; but normalization, which is what we're supposed to be
talking about, isn't a matter of decomposing relations, it's a matter of decomposing relvars .
Suppose we do decide to perform the suggested decomposition of relvar S into relvars SNC and CT.
Observe now that I really am talking about relvars and not relations; for definiteness, however, let's assume those
relvars have the sample values shown in Figs. 3.1 and 3.2, respectively. For definiteness again, let's focus on relvar
CT specifically. Well, that relvar is indeed a relvar—I mean, it's a variable—and so we can update it. For example
(using the shorthand notation for tuples introduced in Chapter 2), we might insert the tuple:
( 'Rome' , 10 )
But after that update, relvar CT contains a tuple that had no counterpart in relvar S (it doesn't have a
counterpart in relvar SNC either, come to that). Now, such a possibility is often used—indeed, Codd used it himself
in his very first papers on normalization (see Appendix C)—as an argument in favor of doing the normalization in
the first place: The normalized, two-relvar design is capable of representing certain information that the original
one-relvar design isn't. (In the case at hand, it can represent status information for cities that currently have no
supplier located in them.) But that same fact also means that the two designs aren't really information equivalent
after all, and moreover that relvar CT isn't exactly a “projection” of relvar S after all 4 —it contains a tuple that isn't a
projection of, or otherwise derived from, any tuple in relvar S. 5 Or rather (and perhaps more to the point), CT isn't a
projection of the join of SNC and CT, either, and so that join “loses information,” in a sense; to be specific, it loses
the information that the status for Rome is 10. 6
A similar situation arises if we delete the tuple
( 'S5' , 'Adams' , 'Athens' )
4 See later in this section for an explanation of why I place the term “projection” in quotation marks here.
5 Regarding the idea that one tuple might be a projection of another, see the answer to Exercise 2.11 in Appendix D.
6 Joins such as that of SNC and CT are sometimes called lossy joins for this very reason. However, this term is probably best avoided, because it
could also be used to refer to joins such as the join of the projections of S on{SNO,SNAME,STATUS} and {CITY,STATUS}, which lose
information for a different reason. See the discussion of this latter example in Chapter 5; see also Exercise 3.2.
Search WWH ::

Custom Search