Chemistry Reference
In-Depth Information
aromatic system. Of course, for structures that have no aromatic systems,
the keksmiles is identical to the SMILES input to the function. This func-
tion might be used, for example to select keksmiles(cansmi) from a
table for processing by an external drawing program.
Some external programs may also need more information about
exactly how many hydrogen atoms are attached to each heavy atom. The
impsmiles function will produce a SMILES that contains the implicit
hydrogen atom count. For example, impsmiles('CC(C)O') returns [CH3]
[CH]([CH3])[OH].
As discussed above, hydrogen atoms are handled differently from
other atoms in SMILES and SMARTS. When searching for structures
matching CC all structures will be found that contain ethane as a sub-
structure. Of course, this does not mean [CH3][CH3], but rather any two
single-bonded carbon atoms with any number of H atoms attached. One
could be more specific and search for, say, [CH][CH] to require exactly one
H atom on each carbon.
Now consider a more complex caseā€”one where a user draws in a phe-
nyl ring as a substructure search. The drawing program would produce
c1ccccc1. If c1ccccc1 is used, any structure containing a phenyl ring will be
found. The user might have intended to allow all possible substitutions in
all positions on the ring, and indeed this would find those structures. If
a user sketched in an R group (represented as * in SMILES), most draw-
ing programs would produce c1ccccc1*, unless the user painstakingly
set the hydrogen count on every other atom of the ring. Most likely, the
user intended to require H on all positions, except the one with the *. The
intended SMILES would be [cH]1[cH][cH][cH][cH]c1* instead of c1ccccc1*.
To facilitate the hydrogenation of SMILES strings, the impsmiles func-
tion works nicely. It produces a SMILES containing all necessary hydrogen
atoms, paying attention to those atoms which have a * atom attached to
them. For example, impsmiles('c1ccccc1*') returns [cH]1[cH][cH][cH]
[cH]c1*. The resulting SMILES functions very nicely as a search SMARTS
for use in the matches function.
7.8 Input and Output of Molecular Structures
As with all data in an RDBMS, there is an external and internal repre-
sentation of data. This was discussed in an earlier chapter for standard
data types, such as text and numeric. For molecular structures, there is of
course no SQL standard. When building a database containing molecular
structures, a decision should first be made: which internal representation
will be used and which external representation.
This chapter focused primarily on SMILES and canonical SMILES. It
is feasible and common to use SMILES as the internal representation of
molecular structure. Using the SQL functions described in this chapter,
Search WWH ::




Custom Search