Primary Keys Are Nice but Not Essential - Database Design and Relational Theory

Databases Reference

In-Depth Information

And here's yet another example, based on a simple airline application (the predicate is Pilot PILOT takes a

flight out from gate GATE at hour HOUR on day DAY ):

ROSTER { DAY , HOUR , GATE , PILOT }

KEY { DAY , HOUR , GATE }

KEY { DAY , HOUR , PILOT }

How do we choose the primary key in cases such as these? What grounds are there for choosing one key

over another? Codd's criterion of “simplicity” doesn't seem to help. Note too that whichever we choose, we wind

up with an unpleasant asymmetry; e.g., in the marriage example, we might find ourselves treating one spouse as

“more equal than the other” (and thereby certainly offending someone). Why should we be forced to introduce such

asymmetry? Asymmetry is usually not a good idea. Here again, repeated from Chapter 15, is that quote from Polya:

“Try to treat symmetrically what is symmetrical, and do not destroy wantonly any natural symmetry.”

Now, in all of the foregoing examples the keys were not only composite, they overlapped (i.e., they had an

attribute in common). Lest it be thought that it's only when keys overlap that there might be difficulty in choosing

the primary key, therefore, let me give a counterexample. Suppose we have a relvar ELEMENT representing the

periodic table (i.e., the table of of chemical elements). 5 Then every element has a unique name (e.g., lead), a unique

symbol (e.g., the symbol for lead is Pb), and a unique atomic number (e.g., the atomic number for lead is 82) . The

relvar thus clearly has three distinct keys, all of which are simple (i.e., involve just one attribute), and there's

obviously no overlap at all. On what grounds do we choose one of these three keys as the primary key? It seems to

me a good case could be made for any of them, depending on circumstances.

Here's another familiar (perhaps all too familiar) example of a relvar with several keys, all of which are

simple:

TAX_BRACKET { LOW , HIGH , PERCENTAGE }

KEY { LOW }

KEY { HIGH }

KEY { PERCENTAGE }

Of course, I'm assuming here that no two taxable income ranges (LOW to HIGH) are subject to the same tax rate.

I could give many more examples, but by now my point is presumably clear: Not only are there no formal

criteria for choosing one key over another (in those cases where there's a choice), but sometimes there don't appear

to be any informal criteria either. Thus, it really doesn't seem appropriate to insist that such a choice must always be

made, even if it's appropriate in some cases (perhaps even most cases).

There's another important point that needs to be mentioned, a more formal one than most of those I've been

making so far. Over the past 40 years or so, a great deal of research has been carried out on dependency theory and

further normalization, view updating, optimization (including semantic optimization in particular), usability, and

many other matters. And in all of this research it's candidate keys, not primary keys, that play the crucial role.

(Indeed, it must be, precisely because the research in question is formal. The candidate key concept is formally

defined. The primary key concept isn't.) Since this is so, it really doesn't seem appropriate to insist formally on the

PK:AK distinction─though, to repeat, it might be appropriate to recommend it in formally.

Yet another point I want to make is that the PK:AK distinction leads to an undesirable and unnecessary

differentiation between base relvars and other relvars. That's because, according to Codd, the relational model:

5 Like the PLUS example (q.v.) in Chapter 13, ELEMENT might actually be a relation constant rather than a relation variable, but it still has

keys.

Search WWH ::

Custom Search

Home