Database Reference
In-Depth Information
relational database . The relational database concept owes everything to the work of
Edgar F. Codd, a former Royal Air Force pilot and World War II veteran.
Codd has a unique place in computer science history. As digital computing was
being developed in the 1960s, early databases were structured in a hierarchical manner,
meaning that data was structured as a collection of parent-child relationships. This
type of database was sometimes convenient for the application storing the data, espe-
cially when the data itself was inherently hierarchical (for example, a classification of
plant species). However, when the underlying data expressed complex relationships or
had no inherent hierarchy, modeling data using a hierarchical database model was very
clumsy. The biggest problem was the lack of a feature that we take for granted today:
free-form search capabilities. In order to traverse the data stored in this manner, a user
had to have a knowledge of the hierarchical structure. It became clear that a more
f lexible and generalizable model was necessary to make sense of ever-growing datasets.
Codd's relational model is currently so ubiquitous that the concept is well known
even to the casual database user, but let's revisit the basic characteristics. Codd pro-
posed that each record of data be described using tuples , which are discrete sets of
values that can be individually referenced by a unique identifier. In many applications,
tuples are simply ordered lists of values, and each value can be retrieved by referencing
a position in the list. In most programming languages, tuples are zero-based , mean-
ing that the first element is referenced by “0,” the second by “1,” and so on. In Codd's
relational database model, each element in the database record is accessed not by
number, but by a name known as an attribute . For example, if I were to store a data
record of Edgar Codd's name, I could define an attribute as first_name to reference
“Edgar” and another attribute as last_name to store “Codd.” Tuples of the same
type can be organized into tables, which can then be cross-referenced to each other
based on an existing relationship.
A key component to the success of the relational database model is the idea of
normalization : In Codd's view, each unit of data should exist only once in a single
table. This cuts down on redundancy and storage costs. More importantly, normaliza-
tion makes it possible to keep data consistent by having to change values only in a sin-
gle location. In Codd's system, a column in each table can be designated as a primary
key , which is an attribute value that is used to retrieve a single record unambiguously.
The primary key could be used to connect these relationships using some type of syn-
tactical query. The Structured Query Language (SQL) was later created by other
IBM researchers to express relational queries. Codd's concept provides the ability to
ask a variety of questions about the data in various tables so long as the data can be
related in some way.
Let's take a look at the simple example of using a relational table in Listing 3.1.
Our example database holds two types of values: identities of computer scientists and
information about countries. Unless some very sweeping changes happen at the United
Nations, we can assume that each record in the “countries” table is unique, so we
can treat the country name as a primary key. On the other hand, it's possible for two
humans to have exactly the same name. Therefore, in our “people” table, we need to
 
Search WWH ::




Custom Search