Chemistry Reference
In-Depth Information
2.7 Uniqueness
In the table epa.compound, the name column is unique, meaning that no
two rows have the same name. This might be coincidentally true for any
column of any table, but in some cases the nature of the data requires that
one column be defined as unique. Before declaring that a column in a table
be unique, it is essential to understand the nature of the data. For example,
the molecular weight is not a unique value as many structures share the
same molecular weight, although the small set of data in table epa.com-
pound happens to have only unique values of molecular weight. Molecular
formula is also not a unique property of a molecular structure. It might be
argued that name is not unique and indeed that are much better ways
to uniquely identify molecular structure. However, since the purpose of
Table 2.2 is to provide a primary table to store each molecular structure, it
is advisable to have one unique column to prevent duplication of rows and
possible confusion if the same structure is entered multiple times.
By defining the cid column in the epa.compound table, we artificially
create a column that is unique. This cid is unique in the epa.compound
table, but not unique in the epa.logP table. This is simply because the nature
of the data in the EPA schema requires that each compound be “registered”
in the epa.compound table, but it may have many logP values associated
with it. Of course, many other tables analogous to epa.logP, for example,
epa.solubility or epa.toxicity, could be added as those data become avail-
able or important in the database. The use of multiple tables, at least one
of which defines the set of unique compounds of interest, is a hallmark of
chemical relational databases. The use of multiple independent tables is
one of the major advantages of RDBMS, allowing for easy extensibility.
2.8 Sequences
The cid column of table epa.compound is a simple integer, starting with 1
and increasing up to the number of compounds. The purpose of this col-
umn is to provide a unique key allowing tables in the schema to be related
to one another. Any unique value would suffice, but integers are typically
used because computers can store integers compactly and manipulate them
efficiently. Any method of creating unique integers to be stored in the cid
column would work, but most RDMS proved a convenient way to generate
unique integers. The sequence function can be used to generate integers
starting with 1 (or another chosen value) and increasing by 1 (or another
chosen nonzero value). Every time a value is chosen from the sequence, that
value becomes unavailable, ensuring a set of unique integers. There can be
any number of sequences in the RDBMS. One typically defines a sequence
that is associated with a unique column and not used for other purposes.
Search WWH ::




Custom Search