Chemistry Reference
In-Depth Information
Operators, such as +, *, || and functions such as sqrt, round, and upper
can be used with these data types. SQL has the ability to search data,
using functions such as =, <, and the like. The goal of the SQL extensions
is to enable SMILES to be handled as readily as any standard data type.
This requires that SQL be extended to validate and standardize, or canon-
icalize SMARTS. In addition, these SQL extensions provide functions
and operators to allow comparisons and searches of molecular structures
stored as SMILES.
Complete molecular structures can be stored as canonical SMILES in
a text column. A structure can be located using an SQL clause such as
where cansmi = 'CC(C)O' . But, writing an SQL clause like this requires
knowing the standard way of spelling the canonical SMILES for the
search structure. External programs could be used to generate the canoni-
cal SMILES, but extending SQL to generate the canonical SMILES is a bet-
ter approach. First, consider how text data is typically used in a relational
database.
Suppose a text column is used to store researcher's names. A good
database design requires that data be represented in some standard
way. In this case, upper and lower case standardization might be used.
One common standard requires that the data be stored in lowercase.
When an SQL search clause is written, the lowercase string would be
used. For example, where name = 'einstein' would find rows stor-
ing data about einstein, but where name = 'EINSTEIN' would find
no rows. It is burdensome to rely on users or programmers to always
type the search name in lower case. When taking input from various
sources, mixed upper and lower case data will invariably be encoun-
tered. The lower function of SQL makes using the lowercase standard
easy. For example, Where name = lower('Einstein') or Where name
= lower('EINSTEIN') would each find the intended rows.
If canonical SMILES are stored in a text column, a direct lookup is pos-
sible. But relying on users or programmers to know the canonical spelling
of every potential search query is not advised. To use canonical SMILES
in a way analogous to the lowercase example above, a cansmiles SQL
function is proposed. Using this function, it becomes possible to use either
the clause where cansmi = cansmiles('C#N') or where cansmi =
cansmiles('N#C') . Either of these SQL clauses would find the hydrogen
cyanide rows in a table that stores canonical SMILES in a column named
cansmi. A new function, such as the cansmiles function described here
is commonly called an SQL extension, since it extends the capabilities of
standard SQL. The cansmiles (or equivalent) function is available in
database extensions from gNova, 7 Daylight, 8 and others. 9 The Appendix of
this topic shows how to create a cansmiles extension function using the
open source database PostgreSQL and open source python or perl mod-
ules. A related function, named valid returns true or false depending on
Search WWH ::




Custom Search