Denormalization - Database Design and Relational Theory

Databases Reference

In-Depth Information

we've backed off from a position that does at least have some solid science and logical theory behind it, and

replaced it by one that's purely pragmatic in nature (as well as being based, typically, on a somewhat narrow

perspective on the overall problem).

WHAT DENORMALIZATION ISN'T (I)

I've said that denormalization means increasing redundancy. But it doesn't follow that increasing redundancy

means denormalization! This is one of the traps the denormalization paper falls into; the design tactics it describes

do increase redundancy (usually), but they're not—with, as noted earlier, one sole exception—applications of

denormalization per se. (In logic, if p implies q is true, it doesn't follow that q implies p is true, and to argue

otherwise is a well known example of faulty reasoning: so well known, in fact, that it enjoys a special name, The

Fallacy of False Conversion .)

Let's examine a few of the examples from the denormalization paper. In one, we're given relvars ITEM and

SALES that look like this:

ITEM { INO , INAME }

KEY { INO }

SALES { SNO , INO , QTY }

KEY { SNO , INO }

FOREIGN KEY { INO } REFERENCES ITEM

The predicates are Item INO has name INAME and Quantity QTY of item INO were sold in store SNO ,

respectively. For performance reasons, the paper suggests adding a TOTAL_QTY attribute to the ITEM relvar,

whose value for any given item is the total sales of that item taken over all stores. But although it's true that the

resulting design involves some redundancy, the fact remains that both relvars are still in BCNF (note in particular

that the FD {INO} → {TOTAL_QTY} holds in the revised version of relvar ITEM). In other words, there's no

denormalization, as such, in this example.

A second example involves what the paper calls “an internal array”:

EMP { ENO , JAN_PAY , FEB_PAY , ..., DEC_PAY }

KEY { ENO }

The predicate is Employee ENO was paid an amount JAN_PAY in January, ..., and an amount DEC_PAY in

December . And presumably, though the paper doesn't say as much explicitly, this “tuple wise” design is meant to

be contrasted with—and for performance reasons, possibly preferred to—the following “attribute wise” analog:

EMP { ENO , MONTH , PAY }

KEY { ENO , MONTH }

But both designs are in BCNF. Again, there's no denormalization here; in fact, to get ahead of myself for a moment

(see Chapter 15), I would say there's no increase in redundancy, either. (On the other hand, the original “tuple

wise” design is probably bad, as you'll see if you consider the query “Get employees whose salary was less than 5K

in at least one month, together with the months in question.”)

Yet another example involves splitting a RESELLERS relvar “horizontally” into two separate relvars,

ACTIVE_RESELLERS and INACTIVE_RESELLERS. In other words, the original relvar is decomposed via

restriction (not projection), and is reconstructed from the two restrictions via union (not join). So we're clearly not

Search WWH ::

Custom Search

Home