Chemistry Reference
In-Depth Information
NEW.fp = fp(NEW.smi);
Return NEW;
End;
$EOSQL$ Language plpgsql;
Create Trigger add_new_structure Before Insert Or Update On
structure For Each Row Execute Procedure add_new_structure();
Create Index cansmi_index On structure (cansmi);
The first statement creates a new schema to contain these tables. The
schema name is arbitrary but might be chosen to be the name of the com-
pany or research organization. Next, the
structure
table is defined to
contain a
smiles
column of type
text
. This column is defined to be
unique
and
not
null
. The uniqueness constraint here ensures that no
duplicate compounds can be entered. The
id
column is defined using
the
serial
data type. This ensures that a unique integer number will be
associated with each structure. This
id
will be used in other tables within
this schema to relate data in those tables to compounds in the
structure
table. The
cansmi
column will be used to contain canonical simplified
molecular input line entry system (SMILES). The
fp
column will be used
to contain a bit string fingerprint of the structure. The
cansmi
and
fp
column can be used when searching for compounds in this table.
The next statement defines a
trigger
function that will be used when-
ever data is inserted or updated in this table. This function performs three
important functions. First, it modifies the SMILES to be inserted into the
smi
column so that it contains the result of the
isosmiles
function. The
isosmiles
function is similar to the
cansmiles
function, except that it
retains any stereochemistry that might be contained within the SMILES.
If two stereoisomers are entered into this table, each will have a unique
isosmiles
value, but the same
cansmiles
value. In this way, they can
be kept distinct, but their identical canonical SMILES shows them to be
stereoisomers. The
trigger
function also computes the fingerprint and
inserts it into the table when the SMILES is inserted or updated.
The use of the uniqueness constraint and the
trigger
ensures integ-
rity of the data in this table. However, it does not automatically correct all
problems. For example, if the insertion of invalid SMILES is attempted,
an error will be generated and the SMILES will not be inserted. It is the
responsibility of the application program to deal with this error. If the
application is a batch style application, there error must certainly be logged
so that it can be dealt with at a later time. If it is an interactive application,
the user is informed of the error and asked to re-enter the structure. If the
SMILES is valid, the
isosmiles
,
cansmiles
, and
fp
function should not
fail. However, if there is some fault in those functions, the overall insert
or update will not occur. For this reason, it might be desirable to log any
error that occurs. These errors might be available in the log file of the
RDBMS server, depending on how that server is configured. Alternatively,
Search WWH ::
Custom Search