Databases Reference
In-Depth Information
of those same tactics in a paper I wrote myself back in 1982. 5 My quarrel is only with the fact that it refers to them
as denormalization tactics specifically.
So here's my own definition, for what it's worth (and I apologize if it seems a little lengthy). I start with the
observation that normalizing relvar R means decreasing redundancy by:
1.
Replacing R by a set of projections R1 , ..., Rn such that at least one of R1 , ..., Rn is at a higher level of
normalization than R , and such that also
2.
For all possible values r of R , if the corresponding values r1 , ..., rn of R1 , ..., Rn (respectively) are joined
back together again, then the result of that join is equal to r .
Hence the following definition:
Definition: Denormalizing a set of relvars R1 , ..., Rn means increasing redundancy by:
1.
Replacing R1 , ..., Rn by their join R such that R is at a lower level of normalization than at least one of
R1 , ..., Rn , and such that also
2.
For all possible values r1 , ..., rn of R1 , ..., Rn (respectively), the result of projecting the corresponding
value r of R over the attributes of Ri is equal to ri ( i = 1, ..., n ).
Points arising:
Observe that denormalization is a process that applies to a set of relvars, not to an individual relvar
considered in isolation. For example, consider relvars SNC and CT, with headings {SNO,SNAME,CITY}
and {CITY,STATUS}, respectively (see Fig. 3.2 for some sample values). These two relvars are in BCNF.
If we join them together, we get the suppliers relvar S (which is only in 2NF, not in 3NF, and therefore not in
BCNF either), and so relvar S can be regarded as a denormalization of relvars SNC and CT. What's more, of
course, relvar S involves more redundancy than relvars SNC and CT do.
If (a) R1 , ..., Rn were obtained by taking projections of R in the first place—in other words, if the
denormalization is really undoing an earlier normalization, so to speak, as in the suppliers example in the
previous bullet item—and if also (b) that earlier normalization was done purely to decrease redundancy and
not to fix a logically incorrect design (see the remarks in Chapter 3 on the difference between these two
possibilities), then (c) the requirement that for all possible values r of R , projecting r over the attributes of Ri
must yield ri ( i = 1, ..., n ) will be met automatically.
The argument in favor of denormalization is basically that it makes retrievals easier to express and makes
them perform better. 6 To what extent this argument might be valid I'll examine in a later section. First, however,
I'd like to point out that once we make the decision to denormalize, we've embarked on a very slippery slope. The
question is: Where do we stop? The situation is different with normalization, where there are clear logical reasons
for continuing the process until we reach the highest possible normal form. Do we then conclude that with
denormalization we should proceed until we reach the lowest possible normal form? Surely not; yet there are no
logical criteria for deciding exactly where the process should stop. In choosing to denormalize, in other words,
5 “A Practical Approach to Database Design,” in Relational Database: Selected Writings (Addison-Wesley, 1986).
6 It's also sometimes claimed to make the database easier to understand. Exercise 8.2 addresses this particular issue.
Search WWH ::




Custom Search