Relational Database Fundamentals - Design and Use of Relational Databases in Chemistry

Chemistry Reference

In-Depth Information

is itself a table with the chosen rows and columns from the table being

searched. When data is stored in separate tables, the tables are conceptu-

ally (or perhaps actually, depending on the RDBMS) joined together into

a larger table using the key columns to ensure the proper rows are com-

bined with each other. In the EPA examples above, epa.logP rows would

be joined with epa.compound rows where the columns cid have identical

values. The result would be a table, for example, containing the name,

molecular weight, and logP for selected compounds. For compounds with

multiple logP values, there would be multiple rows each with a differ-

ent logP value, but with the same name and molecular weight (and same

compound id).

Efficiently joining tables requires careful consideration of primary key

and foreign key columns, uniqueness, and indexing. While it is not neces-

sary to use a primary key and its related foreign key column when joining

tables, that is a very common, useful, and efficient way to join tables.

2.13 Normal Forms

There are many existing rules in SQL that prevent problems that can-

not be prevented when using flat files. For example, a column of integers

can only contain integers. In database theory, there are additional rules

or suggestions designed to ensure that data tables operate properly and

efficiently. One set of rules is referred to as normal form or normalization.

These rules are not enforced by SQL, but it is a good idea to use these

rules. This section will consider only the first three normal forms. Each

form is more involved and more restrictive. There are at least six normal

forms, but it is rare to encounter normalizations higher than three.

2.13.1 First Normal Form

A table is said to be in first normal form if each row has the same number

of columns, each column has a value, and there are no duplicate rows.

Because an RDBMS uses a table defined with a fixed number of columns, it

is always true that each row contains the same number of columns. If one

allows that null is a value, then every column will have a value. It should

be obvious that repeating a row in a table is wasteful, but also potentially

confusing and prone to error. For example, if two rows in a table of logP

contained the same name and logP, one row may have the logP changed

at some point. Then which row would be the correct row? This condition

also illustrates the final aspect of first normal form: There should be at

least one column, or combination of columns, that could function as a key

that uniquely identifies the row. This is the name column or compound id

column in the above examples. The data in this column must be unique.

Search WWH ::

Custom Search

Home