Normalization: Some Generalities - Database Design and Relational Theory

Databases Reference

In-Depth Information

(and vice versa) that produces the same result. Clearly, such “losslessness” of decompositions is an

important property; whatever we do by way of normalization, we certainly mustn't lose any information

when we do it.



It follows from the foregoing that just as projection is the decomposition operator (with respect to

normalization as conventionally understood), so join is the corresponding re composition operator.

NORMALIZATION SERVES TWO PURPOSES

So far, so good; this is all very familiar stuff. But now I want to point out that if you've been paying careful

attention, you might reasonably accuse me of practicing a tiny (?) deception ... To be specific, I've considered what

it means for a decomposition of relations to be nonloss; but normalization, which is what we're supposed to be

talking about, isn't a matter of decomposing relations, it's a matter of decomposing relvars .

Suppose we do decide to perform the suggested decomposition of relvar S into relvars SNC and CT.

Observe now that I really am talking about relvars and not relations; for definiteness, however, let's assume those

relvars have the sample values shown in Figs. 3.1 and 3.2, respectively. For definiteness again, let's focus on relvar

CT specifically. Well, that relvar is indeed a relvar—I mean, it's a variable—and so we can update it. For example

(using the shorthand notation for tuples introduced in Chapter 2), we might insert the tuple:

( 'Rome' , 10 )

But after that update, relvar CT contains a tuple that had no counterpart in relvar S (it doesn't have a

counterpart in relvar SNC either, come to that). Now, such a possibility is often used—indeed, Codd used it himself

in his very first papers on normalization (see Appendix C)—as an argument in favor of doing the normalization in

the first place: The normalized, two-relvar design is capable of representing certain information that the original

one-relvar design isn't. (In the case at hand, it can represent status information for cities that currently have no

supplier located in them.) But that same fact also means that the two designs aren't really information equivalent

after all, and moreover that relvar CT isn't exactly a “projection” of relvar S after all 4 —it contains a tuple that isn't a

projection of, or otherwise derived from, any tuple in relvar S. 5 Or rather (and perhaps more to the point), CT isn't a

projection of the join of SNC and CT, either, and so that join “loses information,” in a sense; to be specific, it loses

the information that the status for Rome is 10. 6

A similar situation arises if we delete the tuple

( 'S5' , 'Adams' , 'Athens' )

4 See later in this section for an explanation of why I place the term “projection” in quotation marks here.

5 Regarding the idea that one tuple might be a projection of another, see the answer to Exercise 2.11 in Appendix D.

6 Joins such as that of SNC and CT are sometimes called lossy joins for this very reason. However, this term is probably best avoided, because it

could also be used to refer to joins such as the join of the projections of S on{SNO,SNAME,STATUS} and {CITY,STATUS}, which lose

information for a different reason. See the discussion of this latter example in Chapter 5; see also Exercise 3.2.

Search WWH ::

Custom Search

Home