Chemistry Reference
In-Depth Information
is itself a table with the chosen rows and columns from the table being
searched. When data is stored in separate tables, the tables are conceptu-
ally (or perhaps actually, depending on the RDBMS) joined together into
a larger table using the key columns to ensure the proper rows are com-
bined with each other. In the EPA examples above, epa.logP rows would
be joined with epa.compound rows where the columns cid have identical
values. The result would be a table, for example, containing the name,
molecular weight, and logP for selected compounds. For compounds with
multiple logP values, there would be multiple rows each with a differ-
ent logP value, but with the same name and molecular weight (and same
compound id).
Efficiently joining tables requires careful consideration of primary key
and foreign key columns, uniqueness, and indexing. While it is not neces-
sary to use a primary key and its related foreign key column when joining
tables, that is a very common, useful, and efficient way to join tables.
2.13 Normal Forms
There are many existing rules in SQL that prevent problems that can-
not be prevented when using flat files. For example, a column of integers
can only contain integers. In database theory, there are additional rules
or suggestions designed to ensure that data tables operate properly and
efficiently. One set of rules is referred to as normal form or normalization.
These rules are not enforced by SQL, but it is a good idea to use these
rules. This section will consider only the first three normal forms. Each
form is more involved and more restrictive. There are at least six normal
forms, but it is rare to encounter normalizations higher than three.
2.13.1 First Normal Form
A table is said to be in first normal form if each row has the same number
of columns, each column has a value, and there are no duplicate rows.
Because an RDBMS uses a table defined with a fixed number of columns, it
is always true that each row contains the same number of columns. If one
allows that null is a value, then every column will have a value. It should
be obvious that repeating a row in a table is wasteful, but also potentially
confusing and prone to error. For example, if two rows in a table of logP
contained the same name and logP, one row may have the logP changed
at some point. Then which row would be the correct row? This condition
also illustrates the final aspect of first normal form: There should be at
least one column, or combination of columns, that could function as a key
that uniquely identifies the row. This is the name column or compound id
column in the above examples. The data in this column must be unique.
Search WWH ::




Custom Search