Denormalization - Database Design and Relational Theory

Databases Reference

In-Depth Information

I/O. Analogous remarks apply to any query that accesses suppliers only, or parts only, or shipments only,

instead of performing some kind of join.



Note too that denormalization, again because it increases redundancy, will most likely lead to bigger stored

records, and this fact too can lead to more I/O, not less. For example, a 4K page can hold two 2K stored

records but only one 3K stored record; hence, a denormalization that increases redundancy by 50 percent

could increase I/O by 100 percent (I'm speaking pretty loosely here, of course).

My next observation is that even if we accept the claim that denormalization makes retrievals easier to

express and perform better, it certainly makes updates harder to express and perform worse. Now, this point is (as I

said before) widely understood; however, what's not so widely understood is that denormalization opens the door to

integrity violations, too. For example, in relvar S (as opposed to the projection relvars SNC and CT), someone —

either the system or the user, and in current practice probably the latter—is going to have to be responsible for

maintaining the FD {CITY} → {STATUS}; and if that maintenance isn't done, integrity is lost. (By contrast, in the

two-relvar design, all that has to be done is to enforce the key constraint on CT—which will definitely be done by

the system, not the user—and the fact that each city has one status will then be maintained “automatically.”)

My final point is this: Regardless of whether we're talking about

a.

True denormalization, which is done at the physical level only, or

b.

The kind of denormalization we have to do in most of today's SQL products, which affects the logical level

as well,

the point isn't widely enough appreciated that when people say “denormalize for performance,” they're really

referring to the performance of specific applications . As I put it earlier, denormalization is typically based on a

somewhat narrow perspective on the overall problem. Any given physical design is likely to be good for some

applications but bad for others (in terms of its performance implications, that is).

A FINAL REMARK

In this chapter, I've given some strong arguments in favor of not denormalizing; in effect, therefore, I've given

arguments in favor of normalizing. 15 And it's certainly true that good designs are usually fully normalized. But it's

important to understand that the opposite isn't necessarily true! That is, a design can be fully normalized and yet

still be bad. For example, the projection ST of relvar S on attributes SNO and STATUS is certainly in BCNF—in

fact, it's in the highest possible normal form, as we'll see in Part III of this topic—but it's clearly not a good design,

as we saw in Chapter 6.

EXERCISES

8.1 It's sometimes claimed that one advantage of binary relvars over the general n- ary relvars supported by the

relational model is that binary relvars are always in BCNF (implying among other things that we don't need to

15 One reviewer noted that another advantage of normalization is that it tends to simplify the formulation of complex queries, in that it makes

expressions more modular through nesting of simple subexpressions.

Search WWH ::

Custom Search

Home