Databases Reference
In-Depth Information
we've backed off from a position that does at least have some solid science and logical theory behind it, and
replaced it by one that's purely pragmatic in nature (as well as being based, typically, on a somewhat narrow
perspective on the overall problem).
WHAT DENORMALIZATION ISN'T (I)
I've said that denormalization means increasing redundancy. But it doesn't follow that increasing redundancy
means denormalization! This is one of the traps the denormalization paper falls into; the design tactics it describes
do increase redundancy (usually), but they're not—with, as noted earlier, one sole exception—applications of
denormalization per se. (In logic, if p implies q is true, it doesn't follow that q implies p is true, and to argue
otherwise is a well known example of faulty reasoning: so well known, in fact, that it enjoys a special name, The
Fallacy of False Conversion .)
Let's examine a few of the examples from the denormalization paper. In one, we're given relvars ITEM and
SALES that look like this:
ITEM { INO , INAME }
KEY { INO }
SALES { SNO , INO , QTY }
KEY { SNO , INO }
FOREIGN KEY { INO } REFERENCES ITEM
The predicates are Item INO has name INAME and Quantity QTY of item INO were sold in store SNO ,
respectively. For performance reasons, the paper suggests adding a TOTAL_QTY attribute to the ITEM relvar,
whose value for any given item is the total sales of that item taken over all stores. But although it's true that the
resulting design involves some redundancy, the fact remains that both relvars are still in BCNF (note in particular
that the FD {INO} → {TOTAL_QTY} holds in the revised version of relvar ITEM). In other words, there's no
denormalization, as such, in this example.
A second example involves what the paper calls “an internal array”:
EMP { ENO , JAN_PAY , FEB_PAY , ..., DEC_PAY }
KEY { ENO }
The predicate is Employee ENO was paid an amount JAN_PAY in January, ..., and an amount DEC_PAY in
December . And presumably, though the paper doesn't say as much explicitly, this “tuple wise” design is meant to
be contrasted with—and for performance reasons, possibly preferred to—the following “attribute wise” analog:
EMP { ENO , MONTH , PAY }
KEY { ENO , MONTH }
But both designs are in BCNF. Again, there's no denormalization here; in fact, to get ahead of myself for a moment
(see Chapter 15), I would say there's no increase in redundancy, either. (On the other hand, the original “tuple
wise” design is probably bad, as you'll see if you consider the query “Get employees whose salary was less than 5K
in at least one month, together with the months in question.”)
Yet another example involves splitting a RESELLERS relvar “horizontally” into two separate relvars,
ACTIVE_RESELLERS and INACTIVE_RESELLERS. In other words, the original relvar is decomposed via
restriction (not projection), and is reconstructed from the two restrictions via union (not join). So we're clearly not
 
Search WWH ::




Custom Search