Chemistry Reference
In-Depth Information
many useful features would become available, such as canonicalization,
and searching. As described here, the SMILES would not truly be an SQL
data type because it is actually represented as a text string. There are ways
to extend SQL even further to make SMILES a data type equal in every
way to other standard SQL data types. This is discussed in a later section
of this chapter.
Another choice for the internal representation of molecular structure
is a molfile. It would be possible to construct SQL functions like those
described in this chapter that would operate on this type of data. One
disadvantage of molfiles is their greater size compared with SMILES.
One advantage is that it is possible to store atomic coordinates, which
is not possible with SMILES. There are other molecular file formats, but
these are substantially the same as a molfile, except perhaps for specific
atom types that may be of use in some database applications.
The recommendation here is to use SMILES to store molecular struc-
ture itself. If other features of the molecule or atoms need to be stored,
other data types and columns can be added to the row describing the
molecule. It is the “SQL way” to not encode a lot of information into one
data type. When using a molfile as the structural data type, too much data
is encoded in a single data type. The individual data items must be parsed
and validated. Errors creep into the data, due to missing, extra, or invalid
portions of the molfile. Ways of storing atomic coordinates, atom types,
and molecular properties are discussed Chapter 11.
The external representation of molecular structure is a less rigorous
definition. For example, there are many programs available that can con-
vert to and from SMILES and molfiles. These can be used when a molfile
(the external representation) needs to be imported as a SMILES (the inter-
nal representation) into the database. Similarly, a SMILES can be easily
exported as a SMILES or converted to a molfile or other file format. It is
useful to have these conversion functions as SQL extensions.
Consider the extended SQL functions smiles _ to _ molfile and
molfile _ to _ smiles . Having these functions available as SQL exten-
sions allows one to export a molfile from a table containing SMILES. For
example:
Select smiles, smiles_to_molfile(smiles) from atable;
outputs a SMILES string and a molfile as a text string. Similarly, the
function molfile _ to _ smiles could be used to convert a text string
representation of structure to SMILES. If the advice here is followed, a
column of molfiles would not be the internal representation of molecu-
lar structure. Nevertheless, the advice here should not be construed as
a recommendation against ever using molfiles. Having a column for
SMILES as well as a column for molfiles will fit the needs of many data-
base designers.
Search WWH ::




Custom Search