We Need More Science - Database Design and Relational Theory

Databases Reference

In-Depth Information

implicitly). If DB additionally contains some distinct appearance of some tuple, or some combination of

tuples, that also represents p (either explicitly or implicitly), then DB contains, and D permits, redundancy .

The principles of normalization and The Principle of Orthogonal Design are aimed precisely at reducing

redundancy in the foregoing narrow sense. Observe, however, that all the definition says is if (not if and only if )

certain tuples appear, then there's redundancy; i.e., it's not a complete definition. Indeed, we'll see several

examples later in this chapter that still involve redundancy, even though they don't contain distinct tuples or tuple

combinations that represent the same proposition. What's more, most of the examples in question are fully

normalized and fully orthogonal. In other words, the principles of normalization and orthogonality, though

necessary and undoubtedly important, are far from being sufficient.

A LITTLE HISTORY

Before I get into a discussion of how and why normalization and orthogonality are insufficient, I'd like to say a little

more about Codd's attempts, in his very first two papers, to address the redundancy issue. In his 1969 paper, he said

this:

A set of relations is strongly redundant if it contains at least one relation [that] is derivable from the rest of the [relations

in the set].

And he tightened up this definition somewhat in the 1970 paper:

A set of relations is strongly redundant if it contains at least one relation that possesses a projection [that] is derivable

from other projections of relations in the set.

I should explain that when Codd says a relation r is derivable from a set S of relations, he means r is equal to

the result of applying some sequence of relational operations (join, projection, and so forth) to relations from S . I do

have a few comments on his definitions, however:



First, the term relation should be replaced by the term relvar throughout. (Of course, this latter term wasn't

introduced until many years later, and Codd never used it at all.)



Second, we can ignore the qualifier strongly . Codd was distinguishing between “strong” redundancy and

what he called weak redundancy, but weak redundancy is irrelevant as far as we're concerned. The reason is

that weak redundancy has to do merely with equality dependencies that don't hold at all times but do happen

to hold at particular times, given the relation values that happen to exist at the times in question. In fact, it

seems to me that Codd was struggling here with the logical difference between relations and relvars!─see the

previous bullet item. “Strong” redundancy applies to relvars (it's what we usually mean by redundancy

when we talk about database design). “Weak” redundancy, by contrast, applies to relations, not relvars (it's

just an artifact of the values the relvars happen to have at some particular time).



The 1969 definition is subsumed by the 1970 definition, of course, because (as we know from Chapter 6)

every relvar R is identically equal to a certain projection of R ─namely, the identity projection.



More to the point, the 1970 definition is still deficient as a definition of redundancy in general, for at least the

following two reasons:

Search WWH ::

Custom Search

Home