Databases Reference

In-Depth Information

Now recall the predicate for relvar S (see the answer to Exercise 2.6 in Appendix D):

Supplier SNO is named SNAME and is located in city CITY, which has status STATUS.

This latter predicate is clearly not the same as the predicate for the join. To be more precise, if some given tuple
t

satisfies it, then that tuple
t
also satisfies the predicate for the join, but the converse isn't true. That's why the join

“loses information” or “is lossy”—just because some tuple appears in the join, we can't assume it also appears in the

original relvar S.

So what exactly is it that makes some decompositions nonloss and others lossy? This is the question that lies

at the heart of normalization theory. It can be stated formally thus:

Let
r
be a relation and let
r1
, ...,
rn
be projections of
r
. What conditions must be satisfied in order for
r
to be

equal to the join of those projections? (By the way, note the tacit assumption here that—as noted earlier—

join is an
n-
adic operator.)

An important, albeit partial, answer to this question was provided by Ian Heath in 1971 when he proved the

following theorem:

Heath's Theorem
(
for relations
): Let relation
r
have heading
H
and let
X
,
Y
, and
Z
be subsets of
H
such that

the union of
X
,
Y
, and
Z
is equal to
H
. Let
XY
denote the union of
X
and
Y
, and similarly for
XZ
. If
r
satisfies

the FD
X
→
Y
, then
r
is equal to the join of its projections on
XY
and
XZ
.

By way of example, consider the suppliers relation once again (i.e., the current value of relvar S as shown in

Fig. 1.1). That relation satisfies the FD {CITY} → {STATUS}. Thus, taking
X
as {CITY},
Y
as {STATUS}, and
Z

as {SNO,SNAME}, Heath's Theorem tells us that the decomposition of that relation into its projections on

{CITY,STATUS} and {CITY,SNO,SNAME}
5
is nonloss—as indeed we already know.

Now, it's important to understand that (to repeat) Heath's answer to the original question was only partial.

I'll explain what this means in terms of the foregoing example. Basically, the theorem does tell us the

decomposition into projections SNC and CT (see Fig. 3.2 in Chapter 3) is nonloss; however, it doesn't tell us the

one into SNT and CT (see Fig. 5.1) is lossy. In other words, if we decompose on the basis of an FD, as we did in the

example of Fig. 3.2, then Heath's Theorem says the decomposition will be nonloss; but if we decompose on some

other basis, as we did in the example of Fig. 5.1, then the theorem has nothing to say on the matter. Thus, the

theorem gives a sufficient condition, but not a necessary one, for a given (binary) decomposition to be nonloss. It

follows that it might be possible to decompose relation
r
in a nonloss way into its projections on
XY
and
XZ
even if

it doesn't satisfy the FD
X
→
Y
.
Note:
I'll be describing a stronger form of Heath's Theorem, giving both necessary

and sufficient conditions, later in this topic (see Chapter 12).

As an aside, I remark that in the paper in which he proved his theorem, Heath also gave a definition of what

he called “third” normal form that was in fact a definition of BCNF. Since that definition preceded Boyce and

Codd's own definition by some three years, it seems to me that BCNF ought by rights to be called
Heath
normal

form. But it isn't.

Now, in Chapter 3, in the section “Normalization Serves Two Purposes,” I said something like the following:

If you've been paying careful attention, you might reasonably accuse me of practicing a tiny deception in the foregoing

discussion. To be specific, I've considered what it means for a decomposition of
relations
to be nonloss; but

5
Or, as we would “more naturally” tend to write them, interchanging the two sets of attributes and specifying the individual attributes in a “more

natural” order, on {SNO,SNAME,CITY} and {CITY,STATUS}.