Databases Reference
In-Depth Information
5. SELECT AVG ( TEMP.STATUS ) AS RESULT
FROM ( SELECT DISTINCT S.CITY , S.STATUS
FROM S ) AS TEMP
Result (correct at last): 26.667. But note how complicated this expression is compared to its analog on the
fully normalized design (relvars SNC and CT):
6. SELECT AVG ( STATUS ) AS RESULT
FROM CT
DENORMALIZATION CONSIDERED HARMFUL (II)
I said earlier that the argument in favor of denormalization was that it makes retrievals easier to express and makes
them perform better. But does this argument really stand up to careful analysis? Let's take a closer look.
First of all, it clearly isn't true across the board that retrievals are easier to express; the previous section
presented a detailed counterexample, but the point can be made with much simpler examples. By way of
illustration, consider what's involved in formulating the query “Get all supplier details” against (a) the normalized
design of Fig. 1.1 and (b) a denormalized design in which relvars S, SP, and P are replaced by a single “joined”
relvar called, say, SSPP. Here are Tutorial D formulations:
a. S
b. SSPP { SNO , SNAME , STATUS , CITY }
Or if you prefer SQL:
a. SELECT *
FROM S
b. SELECT DISTINCT SNO , SNAME , STATUS , CITY
FROM SSPP
The next point is that many queries are likely to perform worse, too. There are several reasons for this state
of affairs. One is that denormalization leads to redundancy, which in turn can lead to a need to do duplicate
elimination (note that DISTINCT in the second of the foregoing SQL formulations!). Another is as follows:
Suppose again that the join of suppliers, shipments, and parts is represented as one single stored file. Also,
assume for simplicity that any given stored file consists of a physically contiguous collection of stored
records, one for each tuple currently appearing in the relvar the stored file represents.
Let's suppose too for the sake of the argument that the query “Get details for suppliers who supply red parts”
will perform reasonably well against this physical structure. OK; but the query “Get all supplier details” will
perform worse than it would against the structure in which the three relvars map to three physically separate
stored files! Why? Because in the latter design, all supplier stored records will be physically contiguous,
whereas in the former design they'll effectively be spread over a wider area, and will therefore require more
 
Search WWH ::




Custom Search