Normalization: Some Generalities - Database Design and Relational Theory

Databases Reference

In-Depth Information

Aside: It might be argued, at least in the SNC and CT example, that the decomposition also means there's a

constraint that now doesn't have to be maintained: viz., the FD {CITY} → {STATUS}. But this argument

isn't entirely valid—all the decomposition does, in this respect, is move that constraint from one relvar to

another (actually from relvar S to relvar CT, where it's maintained as a side effect of maintaining the

constraint that {CITY} is a key). End of aside.

Now, in the simple example under discussion, the benefits of doing the decomposition almost certainly

outweigh the benefits of not doing so. But such is not always the case; indeed, the question of whether or not to

decompose, in more complicated situations, can be a fairly vexing one. In what follows, in order to avoid a lot of

repetitive text, I'll tend to assume we do always want to do the decomposition—but please don't forget there can

sometimes be persuasive arguments for not doing so, especially in examples more complex than the one at hand,

such as are discussed in Part III of this topic.

CONCLUDING REMARKS

I'd like to close this chapter by addressing a question I haven't discussed in this topic at all so far. It's a matter of

terminology. To be specific: Why are 1NF, 2NF, and the rest called normal forms, anyway? Come to that, why is

normalization called normalization?

The answers to these questions derive from mathematics (though the ideas spill over into several related

disciplines, including the computing discipline in particular—think of floating point numbers, for example). In

mathematics, we often find ourselves having to deal with some large, possibly even infinite, set of objects of some

kind: for example, the set of all matrices, or the set of all rational numbers, or—coming a little closer to home—the

set of all relations. In such a situation, it's desirable to find a set of canonical forms for the objects in question.

Here's a definition:



Definition: Given a set s1 , together with a defined notion of equivalence among elements of that set, subset

s2 of s1 is a set of canonical forms for s1 if and only if every element x1 of s1 is equivalent to just one

element x2 of s2 under that notion of equivalence (and that element x2 is the canonical form for the element

x1 ). 12 Various “interesting” properties that apply to x1 also apply to x2 ; thus, we can study just the small set

s2 , not the large set s1 , in order to prove a variety of “interesting” theorems or results.

As a trivial illustration of this notion, let s1 be the set of nonnegative integers {0,1,2,...}, and let two such

integers be equivalent if and only if they leave the same remainder on division by five. Then we can define s2 to be

the set {0,1,2,3,4}. As for an “interesting” theorem that applies in this example, let x1 , y1 , and z1 be any three

elements of s1 (i.e., any three nonnegative integers), and let their canonical forms in s2 be x2 , y2 , and z2 ,

respectively; then the product y1 * z1 is equivalent to x1 if and only if the product y2 * z2 is equivalent to x2 .

12 It's reasonable to require also that every element x2 of s2 be equivalent to at least one element x1 of s1 . Let me also draw your attention to the

following remarks, paraphrased from the answer to Exercise 2.3 in Appendix D: Throughout this topic, expressions of the form “ B is a subset of

A ” must be understood to include the possibility that B and A might be equal. For example, the set { x , y , z } is a subset of itself. When I want to

exclude such a possibility, I'll talk explicitly in terms of proper subsets; for example, the set { x , z } is a proper subset of the set { x , y , z }. Of course,

no set is a proper subset of itself.

Database Design and Relational Theory

Search WWH ::

Custom Search

Home