Chemistry Reference
In-Depth Information
If a substructure search is desired, it is wise to use the fingerprint stored
in the fp column to reduce the number of structures that must be scanned
using the matches function. The following SQL will locate all structures
that contain the specified substructure.
Select id,smi From structure Where
contains(fp, fp('c1ccccc1C(=O)NC')) And
matches(smi, 'c1ccccc1C(=O)NC'));
The addition of the contains function allows a quicker comparison of
the fingerprint of the desired substructure with the fingerprints stored in
the table. The matches function is then used only for structures which
have passed this initial test. Since the matches function is slower than
the contains function, the overall speed of the search is faster than if the
fingerprint comparison were not done.
It might be tempting to add additional columns to the structure table
to hold defined properties of each structure. Not all properties of a struc-
ture are appropriate for a table of structures. Some properties, for example,
molecular weight and molecular formula are fixed properties of a structure
with a unique value. These might be added as columns to the structure
table. However, they could also be kept in another table related to the struc-
ture table. Consider also how often these values will be needed or if they
will be searched. It is possible to easily compute these properties when
needed, using SQL functions that take a SMILES argument.
Other properties are not unique, for example, chemical names. These
should be stored in a separate table with one row for each value. For
example, the entry in the pubchem database contains 10 synonyms for the
SMILES C1(C(C(C(C(C1O)O)OP(=O)(O)O)O)O)O as shown in Table 13.1.
Each of these should be entered as a separate row in a table of names
along with a column containing the compound id. A simple table of this
type would be created using the following SQL.
Create Table names (cid integer References structure (id), name text);
The cid column is a foreign key referencing the id c of l u m n of f t h e structure
table. This prevents any names from being entered that do not have a cor-
responding entry in the structure table. It also associates the name with the
proper structure. As shown in earlier chapters, names , and smiles can be
selected from the tables in this schema using the following SQL.
Select smi, name From structure Join names On (id=cid);
Any number of other tables can be added to this schema. Each should be
related to the structure table using the compound id . Aside from simply
registering compounds, it might be required to store experimental data
Search WWH ::




Custom Search