Chemistry Reference
In-Depth Information
NEW.fp = fp(NEW.smi);
Return NEW;
End;
$EOSQL$ Language plpgsql;
Create Trigger add_new_structure Before Insert Or Update On
structure For Each Row Execute Procedure add_new_structure();
Create Index cansmi_index On structure (cansmi);
The first statement creates a new schema to contain these tables. The
schema name is arbitrary but might be chosen to be the name of the com-
pany or research organization. Next, the structure table is defined to
contain a smiles column of type text . This column is defined to be
unique and not null . The uniqueness constraint here ensures that no
duplicate compounds can be entered. The id column is defined using
the serial data type. This ensures that a unique integer number will be
associated with each structure. This id will be used in other tables within
this schema to relate data in those tables to compounds in the structure
table. The cansmi column will be used to contain canonical simplified
molecular input line entry system (SMILES). The fp column will be used
to contain a bit string fingerprint of the structure. The cansmi and fp
column can be used when searching for compounds in this table.
The next statement defines a trigger function that will be used when-
ever data is inserted or updated in this table. This function performs three
important functions. First, it modifies the SMILES to be inserted into the
smi column so that it contains the result of the isosmiles function. The
isosmiles function is similar to the cansmiles function, except that it
retains any stereochemistry that might be contained within the SMILES.
If two stereoisomers are entered into this table, each will have a unique
isosmiles value, but the same cansmiles value. In this way, they can
be kept distinct, but their identical canonical SMILES shows them to be
stereoisomers. The trigger function also computes the fingerprint and
inserts it into the table when the SMILES is inserted or updated.
The use of the uniqueness constraint and the trigger ensures integ-
rity of the data in this table. However, it does not automatically correct all
problems. For example, if the insertion of invalid SMILES is attempted,
an error will be generated and the SMILES will not be inserted. It is the
responsibility of the application program to deal with this error. If the
application is a batch style application, there error must certainly be logged
so that it can be dealt with at a later time. If it is an interactive application,
the user is informed of the error and asked to re-enter the structure. If the
SMILES is valid, the isosmiles , cansmiles , and fp function should not
fail. However, if there is some fault in those functions, the overall insert
or update will not occur. For this reason, it might be desirable to log any
error that occurs. These errors might be available in the log file of the
RDBMS server, depending on how that server is configured. Alternatively,
Search WWH ::




Custom Search