Databases Reference
In-Depth Information
in today's systems, which are typically much less than ideal, I believe we should denormalize only as a last resort.
That is, we should back off from a fully normalized design only if all other strategies for improving performance
have failed, somehow, to meet requirements. (Of course, I'm going along here with the usual assumption that
normalization has performance implications—as indeed it does, typically, in current SQL products.)
So far in this chapter I've given what I think is a reasonable definition of what denormalization is, and I've given
some examples of what it isn't. However, perhaps it was simply a mistake on my part to think the term ought to be
used in any kind of precise or logical sense. Certainly it's used very imprecisely in the industry at large; in fact, it
seems to be used—especially in a data warehouse context—to refer to just about anything that could be regarded as
bad design practice. What's more, the practices in question are often explicitly recommended! (by the people who
talk that way, that is). Examples of such bad practice include:
Using repeating groups
Permitting duplicate rows
Using nulls; worse (?), allowing nulls in keys
Mixing different kinds of information in the same column (using a separate “flag” column to specify the type
of individual values in the column in question)
Using a single text column to represent what ought logically to be distinct columns
I'd like to add a note here on star schemas , since the “star schema” concept and “denormalization” are often
mentioned together. 10 The basic idea behind this concept as follows. Suppose we wish to collect a history of
business transactions for analysis purposes; for example, suppose in the case of suppliers and parts that we wish to
record, for each shipment, the particular time interval in which that shipment occurred. Thus, we might identify
time intervals by a time interval identifier (TINO), and introduce another relvar TI to relate those identifiers to the
corresponding time intervals per se. The revised shipments relvar SP and the new time intervals relvar TI might
look as shown in Fig. 8.1. In star schema terminology, SP is the fact table and TI is a dimension table . The
suppliers relvar S and the parts relvar P are also dimension tables (see Fig. 8.2). 11 And the overall structure is
referred to as a “star schema” because of a fancied resemblance of the corresponding entity/relationship diagram to a
star, with the fact table being surrounded by—and connected by “spokes” or “rays” to—the dimension tables, as
shown in Fig. 8.2. (Those “spokes” or “rays” represent foreign key references, of course.)
10 At least one authority claims it's misleading to refer to star schemas as denormalized, however. “[The] use of denormalized when describing a
star implies that the design started out as normalized. Most designs are not produced in such a manner. Not normalized would be a better
description” (from Star Schema: The Complete Reference , by Chris Adamson, McGraw-Hill, 2010).
11 For simplicity I choose to ignore (just for the sake of thye present discussion) the fact that the FD {CITY} → {STATUS} is supposed to hold
in relvar S, and hence that relvar S is less than fully normalized.
Search WWH ::

Custom Search