Databases Reference
In-Depth Information
At the logical level, then, it's tempting just to say that redundancy is always bad. But of course this
statement is much too simplistic, owing to the availability of the view mechanism if nothing else. Let me digress for
a moment to elaborate on this latter point. It's well known, but worth stating explicitly nevertheless, that views (like
normalization, though for very different reasons) serve two rather different purposes:
The user who actually defines view V is, obviously, aware of the expression X in terms of which V is defined.
That user can use the name V wherever the expression X is intended, but such uses are basically just
shorthand (like the use of macros in a programming language).
By contrast, a user who's merely informed that view V exists and is available for use is supposed not to be
aware of the expression X ; to that user, in fact, V is supposed to look and feel just like a base relvar. 3
As an example of Case 1, suppose the user perceives the database as containing two relvars R1 and R2 and
goes on to define their join as a view; clearly, then, that view is redundant so far as that user is concerned, and it
could be dropped without any loss of information. For definiteness, therefore, I'm going to assume from this point
forward (barring explicit statements to the contrary) that no relvar in the database is defined in terms of any others,
so that at least this particular kind of redundancy isn't usually present. With this possibility ruled out, then, it's
tempting to set a stake in the ground and say again that redundancy at the logical level is always undesirable. In
order to adopt such a position, however, we need to be able to say what we mean by redundancy─for otherwise the
position can't possibly make sense. And even if we can come up with a good definition of the term, is the position
(that redundancy at the logical level is always bad, that is) really tenable? Is it possible to eliminate all redundancy?
Is it even desirable?
These are questions of considerable pragmatic importance, of course. Indeed, I think it's noteworthy that
Codd called his very first (1969) paper on the relational model “Derivability, Redundancy , and Consistency of
Relations Stored in Large Data Banks” (boldface added; see Appendix C). And his second (1970) paper, “A
Relational Model of Data for Large Shared Data Banks” (again, see Appendix C)─this is the one that's usually
regarded as the seminal paper in the field, though that characterization is a little unfair to its 1969 predecessor─was
in two parts of almost equal length, the second of which was called “Redundancy and Consistency” (the first was
called “Relational Model and Normal Form”). Codd thus clearly regarded his thoughts on redundancy as a major
part of the contribution of his relational work: rightly so, in my opinion, since he did at least provide us with a
framework in which we could begin to address the issue precisely and systematically.
Now, I showed in the previous chapter that one “definition” of redundancy that doesn't work is this: The
database involves redundancy if and only if it contains two distinct appearances of the same tuple. But we can
validly say the following:
Definition: The database involves redundancy if and only if it contains, directly or indirectly, two distinct
representations of the same proposition.
The trouble is, although this definition is clearly correct, it doesn't help much with the practical problem of
reducing redundancy. But it does at least imply the following, which is a little better:
Definition ( preliminary version ): Let D be a database design; let DB be a database value (i.e., a set of values
for the relvars mentioned in D ) that conforms to D ; and let p be a proposition. Further, let DB contain some
specific appearance of some tuple, or some combination of tuples, that represents p (either explicitly or
3 Emphasis on supposed ─I'm describing an ideal situation here. Today's reality is rather messier, as I'm sure you know.
Search WWH ::

Custom Search