Databases Reference

In-Depth Information

So much for the preamble; now let's get down to specifics.

JOIN DEPENDENCIES─THE BASIC IDEA

Most of the time in this topic so far, I've been making a tacit assumption: namely, I've been assuming that when we

decompose some relvar, we always do so by replacing that relvar by exactly two of its projections.
3
(Note that

Heath's Theorem, which provides the formal underpinning for most of what I've had to say so far regarding nonloss

decomposition, specifically addresses decomposition into exactly two projections.) What's more, that assumption

was fully warranted, so long as our target was only BCNF─in other words, it successfully carried us as far as that

specific target. So you might be surprised to learn that there exist relvars that can't be nonloss decomposed into two

projections but can be nonloss decomposed into three (or maybe more than three).

As an aside, I note that, remarkably enough, Codd gave an example in 1969, in his very first paper on the

relational model (see Appendix C), that showed he was aware of this possibility. However, that example was

apparently overlooked by most of the paper's original readers; certainly it seems to have come as a surprise to the

research community when the possibility was rediscovered several years later (in 1977, to be precise).

Now, I said earlier, albeit loosely, that 5NF had to do with JDs that aren't implied by FDs. I can now add,

though again speaking very loosely, that it has to do with relvars that can't be nonloss decomposed into two

projections but can be nonloss decomposed into three or more. In other words, it's when these circumstances

arise─i.e., when there are JDs that aren't implied by FDs, and relvars that can only be nonloss decomposed into

more than two projections─that you do really have to come to grips with JDs and 5NF.

So what exactly do we mean when we say some JD holds in some relvar? Here's a definition:

Definition:
Let
X1
, ...,
Xn
be subsets of the heading
H
of relvar
R
; then the
join dependency
(JD)

{
X1
, ... ,
Xn
}

holds in
R
if and only if
R
can be nonloss decomposed into its projections on
X1
, ...,
Xn
(i.e., if and only if

every legal value
r
of
R
is equal to the join of the corresponding projections
r1
, ...,
rn
).
X1
, ...,
Xn
are the

components
of the JD, and the JD overall can be read as “star
X1
, ...,
Xn
” or “join
X1
, ...,
Xn
”─though I

hasten to add that “join” really isn't the mot juste here, because the join operator (at least as usually

understood) joins
relations
, and
X1
, ...,
Xn
aren't relations but headings.

By way of a simple example, consider the suppliers relvar S once again. As we know, that relvar is subject

to the FD {CITY} → {STATUS}, and so Heath's Theorem tells us it can be nonloss decomposed into its projections

on {SNO,SNAME,CITY} and {CITY,STATUS}. In other words, the following JD holds in that relvar:

{ { SNO , SNAME , CITY } , { CITY , STATUS } }

Points arising:

3
Of course, I'm referring here to an individual step in the overall process. Clearly, repeated steps─i.e., repeated individual decompositions─will

yield a result consisting of more than two projections, in general, even if each individual decomposition is into just two.