Denormalization - Database Design and Relational Theory

Databases Reference

In-Depth Information

of those same tactics in a paper I wrote myself back in 1982. 5 My quarrel is only with the fact that it refers to them

as denormalization tactics specifically.

So here's my own definition, for what it's worth (and I apologize if it seems a little lengthy). I start with the

observation that normalizing relvar R means decreasing redundancy by:

1.

Replacing R by a set of projections R1 , ..., Rn such that at least one of R1 , ..., Rn is at a higher level of

normalization than R , and such that also

2.

For all possible values r of R , if the corresponding values r1 , ..., rn of R1 , ..., Rn (respectively) are joined

back together again, then the result of that join is equal to r .

Hence the following definition:



Definition: Denormalizing a set of relvars R1 , ..., Rn means increasing redundancy by:

1.

Replacing R1 , ..., Rn by their join R such that R is at a lower level of normalization than at least one of

R1 , ..., Rn , and such that also

2.

For all possible values r1 , ..., rn of R1 , ..., Rn (respectively), the result of projecting the corresponding

value r of R over the attributes of Ri is equal to ri ( i = 1, ..., n ).

Points arising:



Observe that denormalization is a process that applies to a set of relvars, not to an individual relvar

considered in isolation. For example, consider relvars SNC and CT, with headings {SNO,SNAME,CITY}

and {CITY,STATUS}, respectively (see Fig. 3.2 for some sample values). These two relvars are in BCNF.

If we join them together, we get the suppliers relvar S (which is only in 2NF, not in 3NF, and therefore not in

BCNF either), and so relvar S can be regarded as a denormalization of relvars SNC and CT. What's more, of

course, relvar S involves more redundancy than relvars SNC and CT do.



If (a) R1 , ..., Rn were obtained by taking projections of R in the first place—in other words, if the

denormalization is really undoing an earlier normalization, so to speak, as in the suppliers example in the

previous bullet item—and if also (b) that earlier normalization was done purely to decrease redundancy and

not to fix a logically incorrect design (see the remarks in Chapter 3 on the difference between these two

possibilities), then (c) the requirement that for all possible values r of R , projecting r over the attributes of Ri

must yield ri ( i = 1, ..., n ) will be met automatically.

The argument in favor of denormalization is basically that it makes retrievals easier to express and makes

them perform better. 6 To what extent this argument might be valid I'll examine in a later section. First, however,

I'd like to point out that once we make the decision to denormalize, we've embarked on a very slippery slope. The

question is: Where do we stop? The situation is different with normalization, where there are clear logical reasons

for continuing the process until we reach the highest possible normal form. Do we then conclude that with

denormalization we should proceed until we reach the lowest possible normal form? Surely not; yet there are no

logical criteria for deciding exactly where the process should stop. In choosing to denormalize, in other words,

5 “A Practical Approach to Database Design,” in Relational Database: Selected Writings (Addison-Wesley, 1986).

6 It's also sometimes claimed to make the database easier to understand. Exercise 8.2 addresses this particular issue.

Database Design and Relational Theory

Search WWH ::

Custom Search

Home