Data Storage, Searching, and Manipulation - Design and Use of Relational Databases in Chemistry

Chemistry Reference

In-Depth Information

Some items are simple; for example, molecular weight is a numeric value

and time and date can be represented using the timestamp SQL data type.

Other items are more complex, for example, compound location. For com-

plex items, one must consider the components of that data that can be

represented using the available SQL data types. For example, compound

location could simply be stored as a text string. If compound location is

needed only in reports, a text string is a good solution. However, in some

applications, the location needs to be more flexibly defined. It might be

necessary to include a stockroom, a laboratory, a cabinet, etc. Rather than

store this in a single encoded text string, it is much better to define a set

of data columns to hold this information so that it can be readily searched

and updated. This defines a new complex data type composed of built-in

data types, such as numeric, text, or date. Defining a table and its set of

columns and defining the new data type does this. A general rule is to

avoid encoding information in a single column. Rather, design a table (or

new data type) to hold the individual components of the data.

Figure 6.1 shows an entity-relationship diagram for a compound-track-

ing schema. The table named registry defines the sample. The sam-

ple _ id is used to relate other tables in the schema. This sample _ id is a

unique integer that may not be null and will serve as a primary key for this

table and as a foreign key for other tables. In the figure, these attributes are

encoded in the diagram using the asterisk (not null), PK (primary key), FK

(foreign key), and PF (primary foreign key). When a sample is first entered,

a new sample _ id must be used. This will be enforced by the RDBMS

because it is declared to be unique. For convenience, many RDBMS pro-

vide a serial data type, or sequence-generating functions, that provides

the next integer in a series. However, the person or computer program

responsible for making the initial entries in the registry table might also

ensure that the sample _ id is unique among the other sample _ ids in

the table. In addition, a date is stored in the column named registered

and may not be a null value. Finally, a parent _ sample _ id is stored.

This will be used when partial samples, rather than an entire container of

the sample are checked out. This is still called a foreign key even though the

primary key to which it is related is contained in the same table. Initially,

the parent _ sample _ id will be set equal to the sample _ id , indicat-

ing it is the primary entry for this sample. When a sample is taken, a new

sample _ id will be generated and its parent _ sample _ id will be set

equal to the sample _ id from which this subsample is taken. The regis-

tered date will also be stored for this subsample.

The next table to consider is the checkout table. When a sample is

taken, either the parent sample or a subsample, data are recorded in the

checkout table. A sample _ id is recorded, the checked _ out date is

recorded, a valid location _ id from the alocation table is chosen,

and a valid chemist _ id from the chemist table is chosen. Note that

Design and Use of Relational Databases in Chemistry

Search WWH ::

Custom Search

Home