Chemistry Reference
In-Depth Information
using SQL. The properties are not stored as columns in the structure table.
Instead, a separate table is created related to the structure table through the
use of a structure id primary key. If necessary, the entire molecular struc-
ture file can be stored as a text string. This might serve as a repository of
these files. It should be stored in a separate table related by use of a foreign
key to a main structure table containing a unique primary key.
Although SMILES is an entirely equivalent way of storing a connec-
tion table of atoms and bonds, it is sometimes desirable to create a tradi-
tional connection table, for example, when an external program requires
it. The extension functions smiles _ to _ symbols and smiles _ to _
bonds accept a SMILES string and produce an array of either symbols
or bonds. These are discussed in a later section of this chapter. Several
implementations of these functions are shown in the Appendix.
It may also be desirable to store the atomic coordinates read from
these files. The purpose of parsing the coordinates from the file and put-
ting them into a separate column is to enable use of the coordinates from
within the database. If the column is properly defined as a numeric or
float column, this will also ensure that the coordinates are proper num-
bers. If there is no need for atomic coordinates, it is not necessary to cre-
ate a column for these. Later sections of this chapter will discuss ways in
which these atomic coordinates might be used in SQL functions.
In a molecular structure file, an atom record typically contains all of the
information about that atom: the atomic number or symbol, the charge, coor-
dinates, etc. When such a file is parsed into a SMILES string and an array of
coordinates, it is important to be able to associate the proper coordinate with
the proper atom. The use of canonical SMILES ensures this. Because canoni-
cal SMILES defines a unique order of the atoms in a molecule, that order is
used to store the coordinates. Later sections of this chapter will discuss ways
in which atomic coordinates might be stored in columns of a table.
There are many programs available to parse the various molecular
structure file format. OpenBabel is an open-source program that can read
many file formats and produce a SMILES representation of molecular
structure. There are many other commercial products that can do this as
well. In the following examples, the OpenBabel/plpythonu implementa-
tion of molfile parsing will be used. This was introduced in Chapter 10.
The code to define the necessary functions is shown in the Appendix.
11.4 Processing SDF Files
A common way of distributing structural and chemical data is in the form
of an SDF file. An SDF file is a collection of compounds stored in molfile
format and separated with a record containing the string $$$$. Many com-
pound vendors make their libraries available this way. Many research
publications include SDF files of structures and data. In the following
Search WWH ::




Custom Search