Denormalization - Database Design and Relational Theory

Databases Reference

In-Depth Information



First, this area is one in which most DBMS vendors have seriously let us down; most SQL products do

indeed map relvars one for one to stored files, pretty much. 2 Even the exceptions fail to provide us with as

much data independence as we might like, or as much as relational systems are theoretically capable of. As a

practical matter, therefore, that “specious” argument is, sadly, valid for most SQL products today.



Second, even if relvars didn't map one for one to stored files, denormalization might still be desirable at the

stored file level. Indeed, a major reason why mappings that aren't one for one would be desirable is precisely

that they would permit denormalization to be done at the physical level, where it belongs, without it having

to show through to—and thereby corrupt—the logical level.

So I'll assume for the sake of discussion that denormalization does sometimes have to be done, at some level

or other. But what is denormalization?

WHAT DOES DENORMALIZATION MEAN?

Curiously, for a practice that's so widely advocated, there seems to be considerable confusion over what

denormalization actually consists of. (The textbooks aren't much help, either, even those that specialize in design

topics; most of them don't even mention it, and those that do rarely offer a definition, and they certainly don't

discuss the matter in much depth.) For example, a while back I had occasion to read a paper specifically devoted to

the question of denormalization in commercial SQL products. 3 I'll refer to that paper as “the denormalization

paper” in what follows. Now, the author begins by arguing against denormalization. To quote:

I think the normalization principles should be treated as commandments ... unless you're faced with performance

problems that money, hardware scalability, current SQL technology, network optimization, parallelization, or other

performance techniques can't resolve [ slightly reworded, boldface added ].

I couldn't agree more with this position. Indeed, I'm on record as saying very much the same thing myself:

In a paper I wrote in 1990 on the use of SQL systems in practice, 4 I recommended denormalization as a performance

tactic “only if all else fails.” Unfortunately, however, the rest of the denormalization paper tends to suggest that the

author doesn't really know what denormalization is; after the opening position statement quoted above, the paper

goes on to give some eight examples of “designing for performance,” all but one of which have absolutely nothing

to do with denormalization at all!

In the author's defense, however, I say again that it does seem to be difficult to find a precise definition of

denormalization in the literature. Of course, it could be argued that no such definition is needed, given that

(a) denormalization, whatever else it might be, must surely be the inverse of normalization, and (b) normalization in

turn certainly is precisely defined. For the record, however, I'll give some idea as to what a precise definition of

denormalization might look like in just a moment. Before I do, however, let me make it clear that I have no

particular quarrel with the specific design tactics suggested in the denormalization paper; indeed, I suggested several

2 I realize the mapping from relvars to stored files isn't always exactly one to one as I'm suggesting here─for example, some products allow

several relvars to share the same stored file, and some allow a single relvar to span several stored files. But these facts don't significantly affect

the bigger picture, and I ignore them here for simplicity.

3 Sam Hamdan: “Denormalization and SQL-DBMS,” SQL Forum 4 , No. 1 (January/February 1995).

4 “SQL Dos and Don'ts,” in Relational Database Writings 1985-1989 (Addison-Wesley, 1990).

Database Design and Relational Theory

Search WWH ::

Custom Search

Home