Databases Reference
In-Depth Information
And here's yet another example, based on a simple airline application (the predicate is Pilot PILOT takes a
flight out from gate GATE at hour HOUR on day DAY ):
ROSTER { DAY , HOUR , GATE , PILOT }
KEY { DAY , HOUR , GATE }
KEY { DAY , HOUR , PILOT }
How do we choose the primary key in cases such as these? What grounds are there for choosing one key
over another? Codd's criterion of “simplicity” doesn't seem to help. Note too that whichever we choose, we wind
up with an unpleasant asymmetry; e.g., in the marriage example, we might find ourselves treating one spouse as
“more equal than the other” (and thereby certainly offending someone). Why should we be forced to introduce such
asymmetry? Asymmetry is usually not a good idea. Here again, repeated from Chapter 15, is that quote from Polya:
“Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.”
Now, in all of the foregoing examples the keys were not only composite, they overlapped (i.e., they had an
attribute in common). Lest it be thought that it's only when keys overlap that there might be difficulty in choosing
the primary key, therefore, let me give a counterexample. Suppose we have a relvar ELEMENT representing the
periodic table (i.e., the table of of chemical elements). 5 Then every element has a unique name (e.g., lead), a unique
symbol (e.g., the symbol for lead is Pb), and a unique atomic number (e.g., the atomic number for lead is 82) . The
relvar thus clearly has three distinct keys, all of which are simple (i.e., involve just one attribute), and there's
obviously no overlap at all. On what grounds do we choose one of these three keys as the primary key? It seems to
me a good case could be made for any of them, depending on circumstances.
Here's another familiar (perhaps all too familiar) example of a relvar with several keys, all of which are
simple:
TAX_BRACKET { LOW , HIGH , PERCENTAGE }
KEY { LOW }
KEY { HIGH }
KEY { PERCENTAGE }
Of course, I'm assuming here that no two taxable income ranges (LOW to HIGH) are subject to the same tax rate.
I could give many more examples, but by now my point is presumably clear: Not only are there no formal
criteria for choosing one key over another (in those cases where there's a choice), but sometimes there don't appear
to be any informal criteria either. Thus, it really doesn't seem appropriate to insist that such a choice must always be
made, even if it's appropriate in some cases (perhaps even most cases).
There's another important point that needs to be mentioned, a more formal one than most of those I've been
making so far. Over the past 40 years or so, a great deal of research has been carried out on dependency theory and
further normalization, view updating, optimization (including semantic optimization in particular), usability, and
many other matters. And in all of this research it's candidate keys, not primary keys, that play the crucial role.
(Indeed, it must be, precisely because the research in question is formal. The candidate key concept is formally
defined. The primary key concept isn't.) Since this is so, it really doesn't seem appropriate to insist formally on the
PK:AK distinction─though, to repeat, it might be appropriate to recommend it in formally.
Yet another point I want to make is that the PK:AK distinction leads to an undesirable and unnecessary
differentiation between base relvars and other relvars. That's because, according to Codd, the relational model:
5 Like the PLUS example (q.v.) in Chapter 13, ELEMENT might actually be a relation constant rather than a relation variable, but it still has
keys.
Search WWH ::




Custom Search