First, this area is one in which most DBMS vendors have seriously let us down; most SQL products do
indeed map relvars one for one to stored files, pretty much. 2 Even the exceptions fail to provide us with as
much data independence as we might like, or as much as relational systems are theoretically capable of. As a
practical matter, therefore, that “specious” argument is, sadly, valid for most SQL products today.
Second, even if relvars didn't map one for one to stored files, denormalization might still be desirable at the
stored file level. Indeed, a major reason why mappings that aren't one for one would be desirable is precisely
that they would permit denormalization to be done at the physical level, where it belongs, without it having
to show through to—and thereby corrupt—the logical level.
So I'll assume for the sake of discussion that denormalization does sometimes have to be done, at some level
or other. But what is denormalization?
WHAT DOES DENORMALIZATION MEAN?
Curiously, for a practice that's so widely advocated, there seems to be considerable confusion over what
denormalization actually consists of. (The textbooks aren't much help, either, even those that specialize in design
topics; most of them don't even mention it, and those that do rarely offer a definition, and they certainly don't
discuss the matter in much depth.) For example, a while back I had occasion to read a paper specifically devoted to
the question of denormalization in commercial SQL products. 3 I'll refer to that paper as “the denormalization
paper” in what follows. Now, the author begins by arguing against denormalization. To quote:
I think the normalization principles should be treated as commandments ... unless you're faced with performance
problems that money, hardware scalability, current SQL technology, network optimization, parallelization, or other
performance techniques can't resolve [ slightly reworded, boldface added ].
I couldn't agree more with this position. Indeed, I'm on record as saying very much the same thing myself:
In a paper I wrote in 1990 on the use of SQL systems in practice, 4 I recommended denormalization as a performance
tactic “only if all else fails.” Unfortunately, however, the rest of the denormalization paper tends to suggest that the
author doesn't really know what denormalization is; after the opening position statement quoted above, the paper
goes on to give some eight examples of “designing for performance,” all but one of which have absolutely nothing
to do with denormalization at all!
In the author's defense, however, I say again that it does seem to be difficult to find a precise definition of
denormalization in the literature. Of course, it could be argued that no such definition is needed, given that
(a) denormalization, whatever else it might be, must surely be the inverse of normalization, and (b) normalization in
turn certainly is precisely defined. For the record, however, I'll give some idea as to what a precise definition of
denormalization might look like in just a moment. Before I do, however, let me make it clear that I have no
particular quarrel with the specific design tactics suggested in the denormalization paper; indeed, I suggested several
2 I realize the mapping from relvars to stored files isn't always exactly one to one as I'm suggesting here─for example, some products allow
several relvars to share the same stored file, and some allow a single relvar to span several stored files. But these facts don't significantly affect
the bigger picture, and I ignore them here for simplicity.
3 Sam Hamdan: “Denormalization and SQL-DBMS,” SQL Forum 4 , No. 1 (January/February 1995).
4 “SQL Dos and Don'ts,” in Relational Database Writings 1985-1989 (Addison-Wesley, 1990).