Preliminaries - Database Design and Relational Theory

Databases Reference

In-Depth Information

AIMS OF THIS TOPIC

If you're like me, you'll have encountered lots of design theory terms in the literature and live presentations and

the like—terms such as projection-join normal form , the chase , join dependency , FD preservation , and many

others—and I'm sure you've wondered from time to time exactly what they all mean. Thus, it's one of my aims in

this topic to explain such terms: to define them carefully and accurately, to explain their relevance and applicability,

and generally to remove any air of mystery that might seem to surround them. And if I'm successful in that aim, I'll

have gone a good way to explaining what design theory is and why it's important (indeed, a possible alternative title

for the topic would be Database Design Theory: What It Is and Why You Should Care ). Overall, it's my goal to

provide a painless introduction to design theory for database professionals. More specifically, what I want to do is:



Review, though from a possibly unfamiliar perspective, aspects of design you should already be familiar with



Explore in depth aspects you're probably not already familiar with



Provide clear and accurate explanations and definitions (with plenty of examples) of all pertinent concepts

Not spend a lot of time on material that's widely understood already, such as 2NF and 3NF 10



All of that being said, I should say too that database design is not my favorite subject. The reason it's not is

that much of that subject is still somewhat ... well, subjective. As I said earlier, design theory is the scientific

foundation for database design. Sadly, however, there are numerous design issues that the theory simply doesn't

address at all (yet). Thus, while the formal principles I'll be describing in this topic do represent the scientific part

of design, there are other parts that, as I've put it elsewhere, are still more in the nature of an artistic endeavor.

Indeed, one message of the topic is precisely that we need more science in this field.

To put a more positive spin on matters, I'd like to draw your attention to the following. Design theory is (at

least in part) about capturing the meaning of data, and as Codd himself once said in connection with that notion: 11

[The] task of capturing (in a reasonably formal way) more of ... the meaning of data is a never-ending one ... The goal is

nevertheless an extremely important one, because even small successes can bring understanding and order into the field

of database design .

In fact, I'll go further: If your design violates any of the known science, then, as I've written elsewhere (in a

slightly different context), the one thing you can be sure of is that things will go wrong. And though it might be

hard to say exactly what will go wrong, and it might be hard to say whether things will go wrong in a major or minor

way, you know —it's guaranteed—that they will go wrong. Theory is important.

10 However, I will at least give precise definitions of those familiar concepts for reasons of completeness. Since I'm sure they really are familiar,

however, I'll take the liberty of appealing to them from time to time even before we get to the definitions.

11 The quote is from Codd's paper “Extending the Database Relational Model to Capture More Meaning,” ACM TODS 4 , No. 4, 1979 (the italics

are mine). Ted Codd was, of course, the inventor of the relational model; he was also the person who first defined the concept of normalization

in general, as well as the first three normal forms (1NF, 2NF, 3NF) in particular.

Search WWH ::

Custom Search

Home