Chemistry Reference
In-Depth Information
whether the smiles argument is a valid smiles regardless of whether it is
canonical or not.
The cansmiles function should also be used to insert each SMILES
when a table is created. For example:
Insert Into structures (cansmi) Values cansmiles('CC(O)C');
This ensures that the same standardization is used for storing the data
and for searching the data. It is not sufficient to rely on the various exter-
nal programs that can read and write canonical SMILES. Each program
will have a canonical SMILES method that is self-consistent, but it is
likely not identical to other programs' methods. There is, unfortunately,
no universally agreed-upon method to produce canonical SMILES. Once
one method is chosen to implement the cansmiles SQL extension, it is
essential for data integrity to use that method for all database operations
requiring canonical SMILES.
The cansmiles function can also be used to enforce an SQL con-
straint that the cansmi column must contain valid canonical SMILES.
SQL constrains like this are commonly used to maintain data integrity.
For example, the SQL clause check (cansmi = cansmiles(cansmi))
can be used in the initial creation of the table. One might also consider
using an SQL trigger to handle an insert or update to a column that is
required to contain canonical SMILES.
If any structures contain stereochemical atomic centers, consider
using the isosmiles function instead of the cansmiles function. The
isosmiles function and isomeric SMILES are discussed in a later section
of this chapter.
Of course, it is possible to use any SMILES to represent a structure
instead of the canonical SMILES. This makes it easier to use various
external methods and programs for creating or drawing input SMILES.
But unless canonical SMILES is used, the direct lookup capability is lost,
or at least made less efficient. For example, one could store any SMILES
in a text column named smiles . A search using the SQL clause where
cansmiles(smiles) = cansmiles('CC(O)C') would work just fine, but
is less efficient than storing the canonical SMILES in the first place. In
some cases, it is desirable to store other SMILES spellings of each struc-
ture in addition to the canonical SMILES. This is a perfectly good practice,
but these additional SMILES columns should be considered as alternate
spellings of the standard canonical SMILES.
7.4 SMARTS Representation
of Molecular Searches
Using canonical SMILES is a very powerful technique for molecular struc-
ture storage and lookup. However, it is sometimes necessary to perform
Search WWH ::




Custom Search