Chemistry Reference
In-Depth Information
7.9 Useful SQL Extensions
Several new SQL functions have been introduced here. These functions
make it possible to store molecular structures in an RDBMS as text strings.
They also allow these strings to be manipulated and searched in a chemi-
cally meaningful way. This greatly expands the usefulness of an RDBMS
for chemical applications. Table 7.2 summarizes these functions using
SQL notation for deining functions. Much of the rest of this topic will
describe more useful functions and describe ways of using and extending
these ever further.
The Appendix of this topic shows three complete implementations of
these functions using PostgreSQL and PerlMol, FROWNS, and OpenBabel
modules. Each of these three modules is free and open source. Using these
functions is an excellent way to become familiar with the concepts in this
chapter. It is possible to extend these functions even further to take advan-
tage of other features of PerlMol, FROWNS, and OpenBabel to satisfy the
needs of many molecular modeling projects. However, each of these three
modules has limitations. Before embarking on a large complex database
project, a thorough examination of the limitations of PerlMol, FROWNS,
and OpenBabel should be done. One important distinction between these
three modules is how they generate canonical SMILES. Each one generates
valid canonical SMILES, but each produces different canonical SMILES.
This is simply due to differing algorithms for canonically ordering atoms.
As discussed earlier, there is no universal canonical SMILES.
Table 7.2 summarizes the core functions used throughout the rest of
this topic. There are several commercially available chemical extensions
to SQL. There may not be an exact correspondence of functions from these
vendors to functions in Table 7.2.
Table 7.2 Core Chemical SQL Extension Functions
Function
Input type
Output type
Description of output
valid
Text
Boolean
Tests whether SMILES is valid
cansmiles
Text
Text
Canonical form of SMILES
isosmiles
Text
Text
Isomeric form of SMILES
keksmiles
Text
Text
Kekule form of SMILES
matches
Text,text
Boolean
Tests whether SMILES (arg #1)
matches SMARTS
count_matches
Text,text
Integer
Number of times SMARTS
matches SMILES (arg #1)
list_matches
Text,text
Integer
array
Atoms in SMILES (arg #1),
which match SMARTS
smiles_to_molfile
Text
Text
Molfile formatted string
molfile_to_smiles
Text
Text
SMILES
Search WWH ::




Custom Search