Databases Reference
In-Depth Information
Now recall the predicate for relvar S (see the answer to Exercise 2.6 in Appendix D):
Supplier SNO is named SNAME and is located in city CITY, which has status STATUS.
This latter predicate is clearly not the same as the predicate for the join. To be more precise, if some given tuple t
satisfies it, then that tuple t also satisfies the predicate for the join, but the converse isn't true. That's why the join
“loses information” or “is lossy”—just because some tuple appears in the join, we can't assume it also appears in the
original relvar S.
So what exactly is it that makes some decompositions nonloss and others lossy? This is the question that lies
at the heart of normalization theory. It can be stated formally thus:
Let r be a relation and let r1 , ..., rn be projections of r . What conditions must be satisfied in order for r to be
equal to the join of those projections? (By the way, note the tacit assumption here that—as noted earlier—
join is an n- adic operator.)
An important, albeit partial, answer to this question was provided by Ian Heath in 1971 when he proved the
following theorem:
Heath's Theorem ( for relations ): Let relation r have heading H and let X , Y , and Z be subsets of H such that
the union of X , Y , and Z is equal to H . Let XY denote the union of X and Y , and similarly for XZ . If r satisfies
the FD X Y , then r is equal to the join of its projections on XY and XZ .
By way of example, consider the suppliers relation once again (i.e., the current value of relvar S as shown in
Fig. 1.1). That relation satisfies the FD {CITY} → {STATUS}. Thus, taking X as {CITY}, Y as {STATUS}, and Z
as {SNO,SNAME}, Heath's Theorem tells us that the decomposition of that relation into its projections on
{CITY,STATUS} and {CITY,SNO,SNAME} 5 is nonloss—as indeed we already know.
Now, it's important to understand that (to repeat) Heath's answer to the original question was only partial.
I'll explain what this means in terms of the foregoing example. Basically, the theorem does tell us the
decomposition into projections SNC and CT (see Fig. 3.2 in Chapter 3) is nonloss; however, it doesn't tell us the
one into SNT and CT (see Fig. 5.1) is lossy. In other words, if we decompose on the basis of an FD, as we did in the
example of Fig. 3.2, then Heath's Theorem says the decomposition will be nonloss; but if we decompose on some
other basis, as we did in the example of Fig. 5.1, then the theorem has nothing to say on the matter. Thus, the
theorem gives a sufficient condition, but not a necessary one, for a given (binary) decomposition to be nonloss. It
follows that it might be possible to decompose relation r in a nonloss way into its projections on XY and XZ even if
it doesn't satisfy the FD X Y . Note: I'll be describing a stronger form of Heath's Theorem, giving both necessary
and sufficient conditions, later in this topic (see Chapter 12).
As an aside, I remark that in the paper in which he proved his theorem, Heath also gave a definition of what
he called “third” normal form that was in fact a definition of BCNF. Since that definition preceded Boyce and
Codd's own definition by some three years, it seems to me that BCNF ought by rights to be called Heath normal
form. But it isn't.
Now, in Chapter 3, in the section “Normalization Serves Two Purposes,” I said something like the following:
If you've been paying careful attention, you might reasonably accuse me of practicing a tiny deception in the foregoing
discussion. To be specific, I've considered what it means for a decomposition of relations to be nonloss; but
5 Or, as we would “more naturally” tend to write them, interchanging the two sets of attributes and specifying the individual attributes in a “more
natural” order, on {SNO,SNAME,CITY} and {CITY,STATUS}.
Search WWH ::

Custom Search