Chemistry Reference
In-Depth Information
mixing, or resonance of states. These two states can be represented using
SMILES as C1=CC=CC=C1 and C1C=CC=CC=1. SMILES resolves this issue
by introducing an extension to simple valence bond theory, namely an aro-
matic bond. The atoms on either end of the bond are referred to as aro-
matic atoms and are represented using the lowercase atomic symbol. So
benzene becomes c1ccccc1 with implied aromatic bonds between the aro-
matic atoms instead of implied single bonds between nonaromatic atoms.
Canonical SMILES performs this aromatization as well as reordering the
atoms into canonical order. A separate unimolecular transformation is not
n e c e s s a r y.
A more difficult issue arises for atoms that have different valence
states. For example, nitrogen is typically considered to have a valence
state of 3. The valence state of an atom is defined as the sum of the bond
orders to the atom, minus the formal charge. So, ammonia has three
single bonds and the nitrogen has valence state 3. Hydrogen cyanide
has a triple bond to the nitrogen, again resulting in a valence state of
3. A nitrogen atom with a single bond and a double bond also yields a
valence state of 3. Finally, the ammonium cation has four single bonds,
but with a +1 formal charge on the nitrogen, yielding a valence state of
3. In some cases, it is desirable to consider nitrogen to have a valence
state of 5. One common example is the nitro group, an example of which
is CN(=O)(=O) in nitromethane. In this representation, the nitrogen has
a valence state of 5. However, one might also use the SMILES C[N+]
(=O)[O−], which shows nitrogen in the more common valence state 3.
Which SMILES is better? Unfortunately, there is no generally agreed-
upon answer. Some prefer the charge-separated form because it reflects
the more common valence state of nitrogen. Others prefer the former
SMILES because it does not introduce formal atomic charges. In a sense,
the answer is unimportant and is just a theoretical argument. Yet in a
real-world database, it is important to have a consistent representation of
any unique molecular structure.
One way to resolve this issue in a database is to require one particular
form for the nitro group. Putting the burden on the chemist who inputs the
structures is possible, but when hundreds or thousands of structures need
to be imported, say from a vendor or other library, examining and cor-
recting hundreds of individual structures is not feasible. Using a SMIRKS
transformation can easily solve this problem.
Suppose it is decided that the valence 5, noncharge-separated repre-
sentation of the nitro group is to be used throughout the database. The
SMIRKS [O:2]=[N+:1][O-:3]>>[O:2]=[N+0:1]=[O+0:3], when applied to any
charge-separated nitro group will transform it into the proper form. This
is accomplished by creating another new SQL function, xform(smiles,
smarts) . As with the cansmiles and matches functions, this is an
extension to standard SQL. Some form of this transformation function is
Search WWH ::




Custom Search