Chemistry Reference
In-Depth Information
It is sometimes necessary to be able to recognize one structure as a
tautomer of another. This could be because a user entered one tautomer
and expects to find data for other tautomers, especially in cases where the
tautomers are in approximately equal abundance under normal labora-
tory conditions. It may even be that data is stored for a compound before
knowing to which tautomer the data refers. In many cases, experimental
data will be measured for a mixture of tautomers, yet it will be assigned
to one tautomer. There is no simple solution for handing tautomers in
SMILES or in a relational database. If two or more structures are tau-
tomers of each other, this might be recorded in another table related to the
table containing the SMILES.
There are several algorithmic approaches to handling tautomers. In
one approach, all possible tautomers are enumerated 12-14 based on a theo-
retical understanding of valence bond theory. This leads to a large number
of structures, many of which are not expected to be stable or observable.
This large number of tautomers of each structure would have to be stored
in a database or generated when needed. Neither of these solutions seems
practical. In another approach, a set of rules or transformations for com-
monly encountered tautomers is applied. 15,16 This leads to a smaller num-
ber of tautomers. Because they are generated from chemically known
transformations, they form a more reasonable set. These two methods are
useful when attempting to estimate certain physical properties of struc-
tures, such as pKa or logP.
Finally, there are algorithms available for simply recognizing when
two structures are tautomers. This is sufficient to locate all isomers in a
database. In general, two structures are considered to be structural iso-
mers if they share the same molecular formula. Tautomers are a special
type of structural isomer in which the connectivity of the atoms, as well
as the molecular formula, is the same. For example, butane (smiles:CCCC)
and isobutane (smiles:CC(C)C) are strucural isomers but not tautomers.
Butyraldehyde (smiles:CCCC=O) and but-1-en-1-ol (smiles:CCC=CO) are
structural isomers as well as tautomers. A direct comparison of the molec-
ular formulae readily shows the structural isomerism. There is a text graph
representation that can allow easy detection of tautomers.
SMILES is a graph representation of a molecular structure contain-
ing atom and bond information. Typically, hydrogen atoms are also sup-
pressed and inferred by rules of typical valence states of heavy atoms. If
the bond information and aromaticity of atoms are removed from SMILES,
the bonding framework is preserved, but the precise electronic structure
is lost. This is sometimes called a simple molecular graph to distinguish
it from SMILES. For example, CCCCO is the simple graph for butyralde-
hyde as well as its tautomer but-1-en-1-ol. But the simple graph for butane
is CCCC while that for isobutane is CC(C)C. These are not tautomers and
this is shown by their different simple graphs. It should be clear that two
Search WWH ::




Custom Search