Databases Reference

In-Depth Information

implicitly). If
DB
additionally contains some distinct appearance of some tuple, or some combination of

tuples, that also represents
p
(either explicitly or implicitly), then
DB
contains, and
D
permits,
redundancy
.

The principles of normalization and
The Principle of Orthogonal Design
are aimed precisely at reducing

redundancy in the foregoing narrow sense. Observe, however, that all the definition says is
if
(not
if and only if
)

certain tuples appear,
then
there's redundancy; i.e., it's not a complete definition. Indeed, we'll see several

examples later in this chapter that still involve redundancy, even though they don't contain distinct tuples or tuple

combinations that represent the same proposition. What's more, most of the examples in question are fully

normalized and fully orthogonal. In other words, the principles of normalization and orthogonality, though

necessary and undoubtedly important, are far from being sufficient.

A LITTLE HISTORY

Before I get into a discussion of how and why normalization and orthogonality are insufficient, I'd like to say a little

more about Codd's attempts, in his very first two papers, to address the redundancy issue. In his 1969 paper, he said

this:

A set of relations is
strongly redundant
if it contains at least one relation [that] is derivable from the rest of the [relations

in the set].

And he tightened up this definition somewhat in the 1970 paper:

A set of relations is
strongly redundant
if it contains at least one relation that possesses a projection [that] is derivable

from other projections of relations in the set.

I should explain that when Codd says a relation
r
is
derivable
from a set
S
of relations, he means
r
is equal to

the result of applying some sequence of relational operations (join, projection, and so forth) to relations from
S
. I do

have a few comments on his definitions, however:

First, the term
relation
should be replaced by the term
relvar
throughout. (Of course, this latter term wasn't

introduced until many years later, and Codd never used it at all.)

Second, we can ignore the qualifier
strongly
. Codd was distinguishing between “strong” redundancy and

what he called
weak
redundancy, but weak redundancy is irrelevant as far as we're concerned. The reason is

that weak redundancy has to do merely with equality dependencies that don't hold at all times but do happen

to hold at particular times, given the relation values that happen to exist at the times in question. In fact, it

seems to me that Codd was struggling here with the logical difference between relations and relvars!─see the

previous bullet item. “Strong” redundancy applies to relvars (it's what we usually mean by
redundancy

when we talk about database design). “Weak” redundancy, by contrast, applies to relations, not relvars (it's

just an artifact of the values the relvars happen to have at some particular time).

The 1969 definition is subsumed by the 1970 definition, of course, because (as we know from Chapter 6)

every relvar
R
is identically equal to a certain projection of
R
─namely, the identity projection.

More to the point, the 1970 definition is still deficient as a definition of redundancy in general, for at least the

following two reasons: