Databases Reference
In-Depth Information
I/O. Analogous remarks apply to any query that accesses suppliers only, or parts only, or shipments only,
instead of performing some kind of join.
Note too that denormalization, again because it increases redundancy, will most likely lead to bigger stored
records, and this fact too can lead to more I/O, not less. For example, a 4K page can hold two 2K stored
records but only one 3K stored record; hence, a denormalization that increases redundancy by 50 percent
could increase I/O by 100 percent (I'm speaking pretty loosely here, of course).
My next observation is that even if we accept the claim that denormalization makes retrievals easier to
express and perform better, it certainly makes updates harder to express and perform worse. Now, this point is (as I
said before) widely understood; however, what's not so widely understood is that denormalization opens the door to
integrity violations, too. For example, in relvar S (as opposed to the projection relvars SNC and CT), someone
either the system or the user, and in current practice probably the latter—is going to have to be responsible for
maintaining the FD {CITY} → {STATUS}; and if that maintenance isn't done, integrity is lost. (By contrast, in the
two-relvar design, all that has to be done is to enforce the key constraint on CT—which will definitely be done by
the system, not the user—and the fact that each city has one status will then be maintained “automatically.”)
My final point is this: Regardless of whether we're talking about
a.
True denormalization, which is done at the physical level only, or
b.
The kind of denormalization we have to do in most of today's SQL products, which affects the logical level
as well,
the point isn't widely enough appreciated that when people say “denormalize for performance,” they're really
referring to the performance of specific applications . As I put it earlier, denormalization is typically based on a
somewhat narrow perspective on the overall problem. Any given physical design is likely to be good for some
applications but bad for others (in terms of its performance implications, that is).
A FINAL REMARK
In this chapter, I've given some strong arguments in favor of not denormalizing; in effect, therefore, I've given
arguments in favor of normalizing. 15 And it's certainly true that good designs are usually fully normalized. But it's
important to understand that the opposite isn't necessarily true! That is, a design can be fully normalized and yet
still be bad. For example, the projection ST of relvar S on attributes SNO and STATUS is certainly in BCNF—in
fact, it's in the highest possible normal form, as we'll see in Part III of this topic—but it's clearly not a good design,
as we saw in Chapter 6.
EXERCISES
8.1 It's sometimes claimed that one advantage of binary relvars over the general n- ary relvars supported by the
relational model is that binary relvars are always in BCNF (implying among other things that we don't need to
15 One reviewer noted that another advantage of normalization is that it tends to simplify the formulation of complex queries, in that it makes
expressions more modular through nesting of simple subexpressions.
Search WWH ::




Custom Search