Databases Reference
In-Depth Information
implicitly). If
DB
additionally contains some distinct appearance of some tuple, or some combination of
tuples, that also represents
p
(either explicitly or implicitly), then
DB
contains, and
D
permits,
redundancy
.
The principles of normalization and
The Principle of Orthogonal Design
are aimed precisely at reducing
redundancy in the foregoing narrow sense. Observe, however, that all the definition says is
if
(not
if and only if
)
certain tuples appear,
then
there's redundancy; i.e., it's not a complete definition. Indeed, we'll see several
examples later in this chapter that still involve redundancy, even though they don't contain distinct tuples or tuple
combinations that represent the same proposition. What's more, most of the examples in question are fully
normalized and fully orthogonal. In other words, the principles of normalization and orthogonality, though
necessary and undoubtedly important, are far from being sufficient.
A LITTLE HISTORY
Before I get into a discussion of how and why normalization and orthogonality are insufficient, I'd like to say a little
more about Codd's attempts, in his very first two papers, to address the redundancy issue. In his 1969 paper, he said
this:
A set of relations is
strongly redundant
if it contains at least one relation [that] is derivable from the rest of the [relations
in the set].
And he tightened up this definition somewhat in the 1970 paper:
A set of relations is
strongly redundant
if it contains at least one relation that possesses a projection [that] is derivable
from other projections of relations in the set.
I should explain that when Codd says a relation
r
is
derivable
from a set
S
of relations, he means
r
is equal to
the result of applying some sequence of relational operations (join, projection, and so forth) to relations from
S
. I do
have a few comments on his definitions, however:
First, the term
relation
should be replaced by the term
relvar
throughout. (Of course, this latter term wasn't
introduced until many years later, and Codd never used it at all.)
Second, we can ignore the qualifier
strongly
. Codd was distinguishing between “strong” redundancy and
what he called
weak
redundancy, but weak redundancy is irrelevant as far as we're concerned. The reason is
that weak redundancy has to do merely with equality dependencies that don't hold at all times but do happen
to hold at particular times, given the relation values that happen to exist at the times in question. In fact, it
seems to me that Codd was struggling here with the logical difference between relations and relvars!─see the
previous bullet item. “Strong” redundancy applies to relvars (it's what we usually mean by
redundancy
when we talk about database design). “Weak” redundancy, by contrast, applies to relations, not relvars (it's
just an artifact of the values the relvars happen to have at some particular time).
The 1969 definition is subsumed by the 1970 definition, of course, because (as we know from Chapter 6)
every relvar
R
is identically equal to a certain projection of
R
─namely, the identity projection.
More to the point, the 1970 definition is still deficient as a definition of redundancy in general, for at least the
following two reasons: