Databases Reference

In-Depth Information

of those same tactics in a paper I wrote myself back in 1982.
5
My quarrel is only with the fact that it refers to them

as denormalization tactics specifically.

So here's my own definition, for what it's worth (and I apologize if it seems a little lengthy). I start with the

observation that
normalizing
relvar
R
means
decreasing redundancy
by:

1.

Replacing
R
by a set of projections
R1
, ...,
Rn
such that at least one of
R1
, ...,
Rn
is at a higher level of

normalization than
R
, and such that also

2.

For all possible values
r
of
R
, if the corresponding values
r1
, ...,
rn
of
R1
, ...,
Rn
(respectively) are joined

back together again, then the result of that join is equal to
r
.

Hence the following definition:

Definition:
Denormalizing
a set of relvars
R1
, ...,
Rn
means
increasing redundancy
by:

1.

Replacing
R1
, ...,
Rn
by their join
R
such that
R
is at a lower level of normalization than at least one of

R1
, ...,
Rn
, and such that also

2.

For all possible values
r1
, ...,
rn
of
R1
, ...,
Rn
(respectively), the result of projecting the corresponding

value
r
of
R
over the attributes of
Ri
is equal to
ri
(
i
= 1, ...,
n
).

Points arising:

Observe that denormalization is a process that applies to a
set
of relvars, not to an individual relvar

considered in isolation. For example, consider relvars SNC and CT, with headings {SNO,SNAME,CITY}

and {CITY,STATUS}, respectively (see Fig. 3.2 for some sample values). These two relvars are in BCNF.

If we join them together, we get the suppliers relvar S (which is only in 2NF, not in 3NF, and therefore not in

BCNF either), and so relvar S can be regarded as a denormalization of relvars SNC and CT. What's more, of

course, relvar S involves more redundancy than relvars SNC and CT do.

If (a)
R1
, ...,
Rn
were obtained by taking projections of
R
in the first place—in other words, if the

denormalization is really undoing an earlier normalization, so to speak, as in the suppliers example in the

previous bullet item—and if also (b) that earlier normalization was done purely to decrease redundancy and

not to fix a logically incorrect design (see the remarks in Chapter 3 on the difference between these two

possibilities), then (c) the requirement that for all possible values
r
of
R
, projecting
r
over the attributes of
Ri

must yield
ri
(
i
= 1, ...,
n
) will be met automatically.

The argument in favor of denormalization is basically that it makes retrievals easier to express and makes

them perform better.
6
To what extent this argument might be valid I'll examine in a later section. First, however,

I'd like to point out that once we make the decision to denormalize, we've embarked on a very slippery slope. The

question is: Where do we stop? The situation is different with normalization, where there are clear logical reasons

for continuing the process until we reach the highest possible normal form. Do we then conclude that with

denormalization we should proceed until we reach the lowest possible normal form? Surely not; yet there are no

logical criteria for deciding exactly where the process should stop. In choosing to denormalize, in other words,

5
“A Practical Approach to Database Design,” in
Relational Database: Selected Writings
(Addison-Wesley, 1986).

6
It's also sometimes claimed to make the database easier to understand. Exercise 8.2 addresses this particular issue.