Denormalization - Database Design and Relational Theory

Databases Reference

In-Depth Information

in today's systems, which are typically much less than ideal, I believe we should denormalize only as a last resort.

That is, we should back off from a fully normalized design only if all other strategies for improving performance

have failed, somehow, to meet requirements. (Of course, I'm going along here with the usual assumption that

normalization has performance implications—as indeed it does, typically, in current SQL products.)

WHAT DENORMALIZATION ISN'T (II)

So far in this chapter I've given what I think is a reasonable definition of what denormalization is, and I've given

some examples of what it isn't. However, perhaps it was simply a mistake on my part to think the term ought to be

used in any kind of precise or logical sense. Certainly it's used very imprecisely in the industry at large; in fact, it

seems to be used—especially in a data warehouse context—to refer to just about anything that could be regarded as

bad design practice. What's more, the practices in question are often explicitly recommended! (by the people who

talk that way, that is). Examples of such bad practice include:



Using repeating groups



Permitting duplicate rows



Using nulls; worse (?), allowing nulls in keys



Mixing different kinds of information in the same column (using a separate “flag” column to specify the type

of individual values in the column in question)



Using a single text column to represent what ought logically to be distinct columns

I'd like to add a note here on star schemas , since the “star schema” concept and “denormalization” are often

mentioned together. 10 The basic idea behind this concept as follows. Suppose we wish to collect a history of

business transactions for analysis purposes; for example, suppose in the case of suppliers and parts that we wish to

record, for each shipment, the particular time interval in which that shipment occurred. Thus, we might identify

time intervals by a time interval identifier (TINO), and introduce another relvar TI to relate those identifiers to the

corresponding time intervals per se. The revised shipments relvar SP and the new time intervals relvar TI might

look as shown in Fig. 8.1. In star schema terminology, SP is the fact table and TI is a dimension table . The

suppliers relvar S and the parts relvar P are also dimension tables (see Fig. 8.2). 11 And the overall structure is

referred to as a “star schema” because of a fancied resemblance of the corresponding entity/relationship diagram to a

star, with the fact table being surrounded by—and connected by “spokes” or “rays” to—the dimension tables, as

shown in Fig. 8.2. (Those “spokes” or “rays” represent foreign key references, of course.)

10 At least one authority claims it's misleading to refer to star schemas as denormalized, however. “[The] use of denormalized when describing a

star implies that the design started out as normalized. Most designs are not produced in such a manner. Not normalized would be a better

description” (from Star Schema: The Complete Reference , by Chris Adamson, McGraw-Hill, 2010).

11 For simplicity I choose to ignore (just for the sake of thye present discussion) the fact that the FD {CITY} → {STATUS} is supposed to hold

in relvar S, and hence that relvar S is less than fully normalized.

Search WWH ::

Custom Search

Home