Chemistry Reference
In-Depth Information
It is not required that such a column actually be used as an SQL key, but
it is wise to do so. In this way, the SQL uniqueness constraints can help
to ensure that the table is in first normal form. While this one column
must be unique, it is entirely possible and even likely that some other data
values will not be unique. For example, there are expected to be many
compounds that coincidentally have the same logP. This does not violate
first normal form.
Another way to think about first normal form is to simply consider
what the table is intended to contain. If it contains data about structures,
then there must be a unique way to identify those structures, for example,
by name. For a table that contains information about structures, it must
contain only information about structures. It must contain simple values
that are associated with that structure, for example, molecular weight or
logP. It must not contain complex information about its data values, such
as the method used to determine logP.
In the above example, there were cases where multiple logP values for
a single compound became important. If these were inserted into the table
of structures, this would violate first normal form. So they were removed
to a second table related to the first. When there are multiple values, there
should be a separate table that contains information about those values.
The table of structures must not be used to contain information about its
data values but only information about structures.
2.13.2 Second Normal Form
A table in second normal form must have data values that depend only on
the key column that was identified while making sure the table satisfies
first normal form. Using the EPA Table 2.1 example, suppose additional
logP values needed to be recorded because measurements were made
at various temperatures. If a logP_temp column were added, this would
accommodate multiple logP values and the pair of columns, logP and
logP_temp, taken together would be unique. But each logP value would
depend on the logP_temp column, not just on the key column. This vio-
lates second normal form. The solution shown above in which logP values
were removed to a separate table is the correct way to normalize to both
first and second normal form.
One might be tempted to encode information into one column. For
example, the logP values could be forced to fit into one column by encod-
ing temperature along with the logP value, for example “1.55(25C).” While
this would formally satisfy second normal form, this is a really bad idea.
It would force the logP column to have to be a text column, thus eliminat-
ing the ability of the RDBMS to ensure numeric values of logP. It would
further force users and database developers to follow new syntax rules
and write functions to parse fields from within a column.
Search WWH ::




Custom Search