Chemistry Reference
In-Depth Information
a particular purpose, say, for possible purchase or for screening in some
assay. Suppose the data are in the form of a molfile. If the structures and
data in the file are added to an RDBMS having chemical extensions, the data
become immediately more useful. The act of importing the molfile, say, to
a table containing smiles, name, id, and columns of molecular data will
immediately ensure several things. First, it will ensure that the molecular
structure is valid and can be represented as simplified molecular input line
entry system (SMILES). It will ensure that the data values required to be
numeric truly are numeric. If there are chemical data types implemented
in the RDBMS, constraints using those data types can be applied. The more
chemical functionality there is in the RDBMS, the more information about
the structures will become validated and easily available.
There are other advantages to importing structural data into the
RDBMS as soon as possible. Depending on what other tables are in the
RDBMS, it will now be easy to discover which structures are already con-
tained in other tables of the RDBMS. The data in the new table will be
easily accessible to other users and client applications. Once a decision
has been made about which new structures are of interest, these can be
readily moved to other tables in the RDBMS for further work (purchasing,
testing, synthesis, etc.).
Another advantage of using an RDBMS to store chemical data is
simply one of organization. It is very common to have dozens of files of
molecular structures. One typically tries to remind oneself where the file
came from, when it was received, what the purpose of the file is, etc. Using
encoded names for the files or the folders containing the files is a typical
approach. This quickly becomes unwieldy and confusing. On the other
hand, if RDBMS tables are created to contain these data, sensible column
and table names can be created to store information otherwise encoded in
file and folder names. In addition, the generous use of table and column
comments helps make sense of large amounts of data.
In short, it is possible and desirable to use an RDBMS to replace many,
if not all, of the ways in which computer files are used. There are many
advantages to using an RDBMS to store chemical data compared with
using flat files. In spite of the familiarity of using file operations (open,
read, write) in most client programs, these are easily replaced by SQL
operations on the data in the RDBMS. In fact, many operations typically
carried out by client programs can be done entirely within the RDBMS
using chemical extensions and procedural languages to write new SQL
functions.
12.3 Advanced SQL Techniques
Chapter 5 introduced ways in which client programs can be used with
an RDBMS server. Some existing client programs, such as Excel and R,
Search WWH ::




Custom Search