Databases Reference
In-Depth Information
“In order to transform data into a form that is useful to users ...”: This is simply a tendentious remark.
“[Join] is essentially a way of dynamically denormalizing data for greater ease of use”: The user might think
of joins being done dynamically, but there's no reason in general why they can't be done statically (i.e.,
ahead of time)─and I believe they often would be, given a well architected DBMS. 6 It's also untrue to
suggest that the result of a join must always be denormalized. “Greater ease of use” is another tendentious
remark.
“[Users] can't tolerate the time and cost of joins”: Joins aren't necessarily time consuming or expensive.
Again, it depends on the implementation.
“To address the problem, companies replicate data in an ever increasing number of decision support
databases, which represent denormalized views of the data”: This might be true, but if it is, it's an indictment
of current implementations, not an argument for denormalization as such.
8.3 First of all, surrogate keys are not the same thing as tuple IDs. For one thing (to state the obvious),
surrogates identify entities and tuple IDs identify tuples, and there's certainly nothing like a one to one
correspondence between entities and tuples. (Think of derived tuples in particular─for example, tuples in the result
of some query. In fact, it's not at all clear that derived tuples will have tuple IDs anyway.) Furthermore, tuple IDs
usually have performance connotations, but surrogates don't (access to a tuple via its tuple ID is usually assumed to
be fast, but no such observation applies to surrogates). Also, tuple IDs are usually concealed from the user, but
surrogates mustn't be, thanks to The Information Principle (see Exercise 2.1); in other words, it's probably (and
desirably!) not possible to store a tuple ID in a database relvar, while it certainly (and desirably) is possible to store a
surrogate in a database relvar. In a nutshell: Surrogate keys have to do with logical design, tuple IDs have to do
with physical design.
Are surrogate keys a good idea? Well, observe first that the relational model has nothing to say on this
question; like the business of design in general, in fact, whether or not to use surrogate keys has to do with how to
apply the relational model, not with the relational model as such.
That said, I have to say too that the question of whether surrogate keys are good or bad is far from
straightforward. There are strong arguments on both sides: so many such, in fact, that I can't possibly do justice to
them here (though some of them are summarized in Chapter 15). For further details, see the paper “Composite
Keys,” in my topic Relational Database Writings 1989-1991 (Addison-Wesley, 1992). Note: The paper is called
“Composite Keys” because surrogate keys are most likely to be useful in practice in situations in which existing
keys, and corresponding foreign keys, are composite keys specifically.
8.4
I show solutions in SQL, just for a change. Defining the first in terms of the second (in outline):
SELECT DISTINCT EX.ENO ,
( SELECT PAY
FROM EMP AS EY
WHERE EY.ENO = EX.ENO
AND MONTH = 'Jan' ) AS JAN_PAY ,
...
6 I have in mind here, primarily, a DBMS implemented using the facilities of The TransRelational tm Model (and a similar remark applies to all of
my uses of that phrase “well architected” throughout the present topic). You can find a preliminary (and very incomplete) description of that
model in my topic An Introduction to Database Systems (8th edition, Addison-Wesley, 2004), and a much more comprehensive account in my
topic Go Faster! The TransRelational tm Approach to DBMS Implementation (Ventus, 2002,2011).
Search WWH ::




Custom Search