Databases Reference
In-Depth Information
These predicates aren't existentially quantified, and so the corresponding propositions aren't, either.
21
c.
Relvar S certainly contains subtuples corresponding to SN, SC, and CT; however, those corresponding to SN
and SC are never repeated because {SNO} is a key. By contrast, those corresponding to CT are repeated, at
least potentially (as we know from Fig. 1.1), and such repetition does constitute redundancy.
With all of the foregoing by way of motivation, then, I offer the following as a putative “final” definition of
what it means for a database to exhibit redundancy:
Definition
(
“final” version
): Let
D
be a database design; let
DB
be a database value (i.e., a set of values for
the relvars mentioned in
D
) that conforms to
D
; and let
p
be a proposition not involving any existential
quantification. If
DB
contains two or more distinct representations of
p
(either implicitly or implicitly), then
DB
contains, and
D
permits,
redundancy
.
Observe in particular that a database can still display redundancy by this definition, even if it fully conforms
to
The Principle of Orthogonal Design
and all normalization principles. Note, however, that the definition still says
if
(not
if and only if
) a certain condition holds,
then
there's redundancy; I'd like to replace that
if
by
if and only if
,
but I don't quite have the courage of my convictions here. Not yet.
Be that as it may, let's consider Examples 1-12 from earlier sections of this chapter and see what the
implications are for those examples specifically.
Please note carefully:
For simplicity, I use the unqualified term
proposition
throughout the following analysis to mean a proposition that's not existentially quantified.
Examples 1-2
Both of these examples display redundancy because the proposition
City SFO and state CA are the city and state for
zip 94100
is represented twice.
Example 3
Suppose two distinct tuples both contain the DATE value “Tuesday, January 18th, 2011”; then the database clearly
displays redundancy because the proposition
January 18th, 2011 is a Tuesday
is represented twice, explicitly. In
fact, there's redundancy even if that DATE value appears just once! The reason is that even in that case, the
proposition
January 18th, 2011 is a Tuesday
is represented both explicitly and implicitly, as a result of the fact that
one part of the value, the day of the week (Tuesday, in the example), can be determined algorithmically from the rest
of the value (January 18th, 2011, in the example).
Example 4
Let employee E1 be represented in both relvar EMP and relvar PGMR. The corresponding propositions are
E1 is an
employee
and
E1 is a programmer
. The former proposition is a logical consequence of the combination of the latter
proposition together with the proposition
All programmers are employees
. (Note that this latter proposition is
represented, in effect, by the fact that {ENO} in PGMR is a foreign key referencing {ENO} in EMP.) Thus, the
proposition
E1 is an employee
is represented twice, once explicitly and once implicitly.
21
In connection with the lack of quantification in the predicate for CT in particular, you might want to take another look at the section
“Normalization Serves Two Purposes” in Chapter 3.