Chemistry Reference
In-Depth Information
A common problem arises when one uses explicit H atoms in a
SMARTS. For example, the SMARTS C([H])[CH0] contains an explicit H
atom. It will only match SMILES that also contain an explicit H atom.
Most every database has zero such SMILES. For this reason, it is impor-
tant to emphasize that the SMARTS C([H])[CH0] does not represent the
same substructure as [CH][CH0]. Unless one carefully designs a database
to include explicit H atoms in every SMILES, explicit H atoms should not
be used in SMARTS. This includes uses of H in, for example, C[H,F,Cl].
This will not match SMILES that contain CH, unless that SMILES was
stored with an explicit H atom.
7.5.2 Aromaticity
Benzene is typically thought of as a combination of two equivalent reso-
nance structures. These could be written as the SMILES C1=C-C=C-C=C1
and C1-C=C-C=C-C=1. In order to have just one representation for ben-
zene and other aromatic systems, SMILES handles these aromatic systems
specially, treating the atoms in an aromatic ring as a special aromatic type
and the bonds as a special aromatic type. The lowercase symbol is used
to denote an aromatic atom in SMILES and SMARTS. The SMILES for
benzene then becomes c1ccccc1. A bond between aromatic atoms is an
aromatic bond, unless otherwise spelled out. For example, biphenyl can
be written as c1ccccc1-c1ccccc1.
This internal aromatic handling is not done for SMARTS. For example,
matches('c1ccccc1', 'C1=C-C=C-C=C1') will be false. This becomes a
problem when using input from an external program, such as a sketching
program that may provide SMILES or SMARTS for an aromatic system in
one of the many possible resonance forms. To get around this, convert the
SMILES or SMARTS using cansmiles , which will aromatize the appro-
priate atoms. For example, matches('c1ccccc1', cansmiles('C1=C-
C=C-C=C1')) is true. However, cansmiles will fail if its input is not a
proper SMILES, for example, if it contains an atom list such as [F,Cl,Br].
7.5.3 Tautomers
While the several resonance forms for aromatic systems are neatly solved
using aromatic atom types in SMILES, the issue of multiple tautomers
cannot be handled as neatly. This is a good thing. After all, it is quite
possible to distinguish two different tautomers experimentally and mea-
sure different properties that may need to be stored in a database. On the
other hand, it is not possible to distinguish two different resonance forms
experimentally. Every equivalent resonance form for a structure ought to
be considered to be the same structure. Resonance is simply a theoretical
concept.
Search WWH ::




Custom Search