Relational Database Fundamentals - Design and Use of Relational Databases in Chemistry

Chemistry Reference

In-Depth Information

2.7 Uniqueness

In the table epa.compound, the name column is unique, meaning that no

two rows have the same name. This might be coincidentally true for any

column of any table, but in some cases the nature of the data requires that

one column be defined as unique. Before declaring that a column in a table

be unique, it is essential to understand the nature of the data. For example,

the molecular weight is not a unique value as many structures share the

same molecular weight, although the small set of data in table epa.com-

pound happens to have only unique values of molecular weight. Molecular

formula is also not a unique property of a molecular structure. It might be

argued that name is not unique and indeed that are much better ways

to uniquely identify molecular structure. However, since the purpose of

Table 2.2 is to provide a primary table to store each molecular structure, it

is advisable to have one unique column to prevent duplication of rows and

possible confusion if the same structure is entered multiple times.

By defining the cid column in the epa.compound table, we artificially

create a column that is unique. This cid is unique in the epa.compound

table, but not unique in the epa.logP table. This is simply because the nature

of the data in the EPA schema requires that each compound be “registered”

in the epa.compound table, but it may have many logP values associated

with it. Of course, many other tables analogous to epa.logP, for example,

epa.solubility or epa.toxicity, could be added as those data become avail-

able or important in the database. The use of multiple tables, at least one

of which defines the set of unique compounds of interest, is a hallmark of

chemical relational databases. The use of multiple independent tables is

one of the major advantages of RDBMS, allowing for easy extensibility.

2.8 Sequences

The cid column of table epa.compound is a simple integer, starting with 1

and increasing up to the number of compounds. The purpose of this col-

umn is to provide a unique key allowing tables in the schema to be related

to one another. Any unique value would suffice, but integers are typically

used because computers can store integers compactly and manipulate them

efficiently. Any method of creating unique integers to be stored in the cid

column would work, but most RDMS proved a convenient way to generate

unique integers. The sequence function can be used to generate integers

starting with 1 (or another chosen value) and increasing by 1 (or another

chosen nonzero value). Every time a value is chosen from the sequence, that

value becomes unavailable, ensuring a set of unique integers. There can be

any number of sequences in the RDBMS. One typically defines a sequence

that is associated with a unique column and not used for other purposes.

Search WWH ::

Custom Search

Home