Chemistry Reference
In-Depth Information
7.5.8 InChI and Canonical SMILES
Canonical SMILES is a powerful tool for encoding a molecular struc-
ture as a character string, especially for use in relational database tables.
Unfortunately, there is no universally accepted algorithm for produc-
ing canonical SMILES. For example, the canonical SMILES produced by
OpenEye may not be the same as that produced by Daylight, ChemAxon,
or ChemDraw. This is generally not an issue, as long as the same software
is consistently used for creating, storing, and searching canonical SMILES.
If there were one universal canonical SMILES “name” for a molecular
structure, it would be possible to use this canonical SMILES in any web
document. This would greatly help lookups across the web, allowing a
simple string search to find exact molecular structures.
Recently, a universal string representation method was proposed and
published. The International Chemical Identifier,17 17 or InChI™, is a defini-
tion and set of methods maintained by the International Union of Pure
and Applied Chemistry. It promises to provide a truly universal character
string representation of molecular structure. Whether it will replace the
widely used SMILES is yet to be seen.
7.6 SMILES and Inorganic Structures
All the examples in this chapter have been organic structures. SMILES is
not limited to storing organic structures. Every atom in the periodic can
be equally well represented. However, the “organic atoms” are handled
specially in SMILES. Every atom in a SMILES can be represented using
the atomic symbol in brackets, for example [C], [U], or [Na]. But the atoms
B, C, N, O, S, P, F, Cl, Br, and I can be used without brackets. When used
without brackets, SMILES assumes the lowest normal valency for these
atoms. For example, formaldehyde is written as C=O, but carbon mon-
oxide is written as [C]=[O] or [C]=O. It could be argued that the correct
SMILES for carbon monoxide is [C+]#[O−]. But this argument diverges
into valence bond theory, which will not be further discussed here. See
also the section above about valence in SMILES.
7.7 Other SMILES Extensions
Some external programs do not use the aromatic model for SMILES and
prefer using the so-called kekule form of the SMILES. This is not a canoni-
cal SMILES but can be useful for export to a drawing program, if users
prefer to see alternating double bonds in aromatic ring systems. A kekule
SMILES might even be necessary for some programs, which do not han-
dle aromatic atoms in the same way as described here. The keksmiles
function computes one of the many valid resonance structures for an
Search WWH ::




Custom Search