Databases Reference
In-Depth Information
DENORMALIZATION CONSIDERED HARMFUL (I)
In this section, I'd like to present an argument—a logical argument, that is, and one you might not have seen
before—in support of the position that you should denormalize only as a last resort. Essentially, the argument is that
while (as is well known) denormalization can be logically bad for update, it can be logically bad for retrieval as
well , in the sense that it can make certain queries harder to formulate. (Alternatively, it can make them easier to
formulate incorrectly, meaning that, if they execute, you're getting answers that might be correct in themselves but
are answers to the wrong questions.) Let me illustrate.
Once again consider relvar S, with its FD {CITY} → {STATUS}. As noted earlier, that relvar can be
regarded as the result of denormalizing relvars SNC (with attributes SNO, SNAME, and CITY) and CT (with
attributes CITY and STATUS). Now consider the query “Get the average supplier city status value.” Given the
sample values in Fig. 3.2, the status values for Athens, London, and Paris are 30, 20, and 30, respectively, and so the
average is 80/3, which is 26.667 to three decimal places. Here then are some attempts at formulating this query in
SQL (I'll assume for simplicity that S is nonempty, so we don't have to worry about what happens in SQL if we try
to apply the AVG operator to an empty set): 13
1. SELECT AVG ( STATUS ) AS RESULT
FROM S
Result (incorrect): 26. The problem here is that London's status and Paris's status have both been counted
twice. Perhaps we need a DISTINCT inside the AVG invocation? Let's try that:
2. SELECT AVG ( DISTINCT STATUS ) AS RESULT
FROM S
Result (incorrect): 25. No, it's distinct cities we need to examine, not distinct status values. We can do that
by grouping:
3. SELECT CITY , AVG ( STATUS ) AS RESULT
FROM S
GROUP BY CITY
Result (incorrect): (Athens,30), (London,20), (Paris,30). This formulation gives average status per city , not
the overall average. Perhaps what we want is the average of the averages?─
4. SELECT CITY , AVG ( AVG ( STATUS ) ) AS RESULT
FROM S
GROUP BY CITY
Result: Syntax error—the SQL standard quite rightly doesn't allow “set function” invocations to be nested in
this manner. 14 One more attempt:
13 What changes would be needed to the various SQL expressions if we couldn't make that assumption?
14 I say “quite rightly” only because we're in the SQL context specifically; a more orthodox language such as Tutorial D would certainly let us
nest such invocations (or its analog of such invocations, rather). Let me explain. Consider the SQL expression SELECT SUM(QTY) AS
RESULT FROM SP WHERE QTY > 100 (I deliberately switch to a different example for reasons of clarity). The argument to the SUM
invocation here is really what's denoted by the expression QTY FROM SP WHERE QTY > 100, and a more orthodox language would therefore
enclose that whole expression in parentheses. But SQL doesn't. As a consequence, an expression of the form AVG(SUM(QTY)) has to be
illegal, because SQL can't figure out which portions of the surrounding expression have to do with the AVG argument and which with the SUM
argument.
Search WWH ::




Custom Search