AIMS OF THIS TOPIC
If you're like me, you'll have encountered lots of design theory terms in the literature and live presentations and
the like—terms such as projection-join normal form , the chase , join dependency , FD preservation , and many
others—and I'm sure you've wondered from time to time exactly what they all mean. Thus, it's one of my aims in
this topic to explain such terms: to define them carefully and accurately, to explain their relevance and applicability,
and generally to remove any air of mystery that might seem to surround them. And if I'm successful in that aim, I'll
have gone a good way to explaining what design theory is and why it's important (indeed, a possible alternative title
for the topic would be Database Design Theory: What It Is and Why You Should Care ). Overall, it's my goal to
provide a painless introduction to design theory for database professionals. More specifically, what I want to do is:
Review, though from a possibly unfamiliar perspective, aspects of design you should already be familiar with
Explore in depth aspects you're probably not already familiar with
Provide clear and accurate explanations and definitions (with plenty of examples) of all pertinent concepts
Not spend a lot of time on material that's widely understood already, such as 2NF and 3NF 10
All of that being said, I should say too that database design is not my favorite subject. The reason it's not is
that much of that subject is still somewhat ... well, subjective. As I said earlier, design theory is the scientific
foundation for database design. Sadly, however, there are numerous design issues that the theory simply doesn't
address at all (yet). Thus, while the formal principles I'll be describing in this topic do represent the scientific part
of design, there are other parts that, as I've put it elsewhere, are still more in the nature of an artistic endeavor.
Indeed, one message of the topic is precisely that we need more science in this field.
To put a more positive spin on matters, I'd like to draw your attention to the following. Design theory is (at
least in part) about capturing the meaning of data, and as Codd himself once said in connection with that notion: 11
[The] task of capturing (in a reasonably formal way) more of ... the meaning of data is a never-ending one ... The goal is
nevertheless an extremely important one, because even small successes can bring understanding and order into the field
of database design .
In fact, I'll go further: If your design violates any of the known science, then, as I've written elsewhere (in a
slightly different context), the one thing you can be sure of is that things will go wrong. And though it might be
hard to say exactly what will go wrong, and it might be hard to say whether things will go wrong in a major or minor
way, you know —it's guaranteed—that they will go wrong. Theory is important.
10 However, I will at least give precise definitions of those familiar concepts for reasons of completeness. Since I'm sure they really are familiar,
however, I'll take the liberty of appealing to them from time to time even before we get to the definitions.
11 The quote is from Codd's paper “Extending the Database Relational Model to Capture More Meaning,” ACM TODS 4 , No. 4, 1979 (the italics
are mine). Ted Codd was, of course, the inventor of the relational model; he was also the person who first defined the concept of normalization
in general, as well as the first three normal forms (1NF, 2NF, 3NF) in particular.