Databases Reference

In-Depth Information

These predicates aren't existentially quantified, and so the corresponding propositions aren't, either.
21

c.

Relvar S certainly contains subtuples corresponding to SN, SC, and CT; however, those corresponding to SN

and SC are never repeated because {SNO} is a key. By contrast, those corresponding to CT are repeated, at

least potentially (as we know from Fig. 1.1), and such repetition does constitute redundancy.

With all of the foregoing by way of motivation, then, I offer the following as a putative “final” definition of

what it means for a database to exhibit redundancy:

Definition
(
“final” version
): Let
D
be a database design; let
DB
be a database value (i.e., a set of values for

the relvars mentioned in
D
) that conforms to
D
; and let
p
be a proposition not involving any existential

quantification. If
DB
contains two or more distinct representations of
p
(either implicitly or implicitly), then

DB
contains, and
D
permits,
redundancy
.

Observe in particular that a database can still display redundancy by this definition, even if it fully conforms

to
The Principle of Orthogonal Design
and all normalization principles. Note, however, that the definition still says

if
(not
if and only if
) a certain condition holds,
then
there's redundancy; I'd like to replace that
if
by
if and only if
,

but I don't quite have the courage of my convictions here. Not yet.

Be that as it may, let's consider Examples 1-12 from earlier sections of this chapter and see what the

implications are for those examples specifically.
Please note carefully:
For simplicity, I use the unqualified term

proposition
throughout the following analysis to mean a proposition that's not existentially quantified.

Examples 1-2

Both of these examples display redundancy because the proposition
City SFO and state CA are the city and state for

zip 94100
is represented twice.

Example 3

Suppose two distinct tuples both contain the DATE value “Tuesday, January 18th, 2011”; then the database clearly

displays redundancy because the proposition
January 18th, 2011 is a Tuesday
is represented twice, explicitly. In

fact, there's redundancy even if that DATE value appears just once! The reason is that even in that case, the

proposition
January 18th, 2011 is a Tuesday
is represented both explicitly and implicitly, as a result of the fact that

one part of the value, the day of the week (Tuesday, in the example), can be determined algorithmically from the rest

of the value (January 18th, 2011, in the example).

Example 4

Let employee E1 be represented in both relvar EMP and relvar PGMR. The corresponding propositions are
E1 is an

employee
and
E1 is a programmer
. The former proposition is a logical consequence of the combination of the latter

proposition together with the proposition
All programmers are employees
. (Note that this latter proposition is

represented, in effect, by the fact that {ENO} in PGMR is a foreign key referencing {ENO} in EMP.) Thus, the

proposition
E1 is an employee
is represented twice, once explicitly and once implicitly.

21
In connection with the lack of quantification in the predicate for CT in particular, you might want to take another look at the section

“Normalization Serves Two Purposes” in Chapter 3.