Applications - Design and Use of Relational Databases in Chemistry

Chemistry Reference

In-Depth Information

If a substructure search is desired, it is wise to use the fingerprint stored

in the fp column to reduce the number of structures that must be scanned

using the matches function. The following SQL will locate all structures

that contain the specified substructure.

Select id,smi From structure Where

contains(fp, fp('c1ccccc1C(=O)NC')) And

matches(smi, 'c1ccccc1C(=O)NC'));

The addition of the contains function allows a quicker comparison of

the fingerprint of the desired substructure with the fingerprints stored in

the table. The matches function is then used only for structures which

have passed this initial test. Since the matches function is slower than

the contains function, the overall speed of the search is faster than if the

fingerprint comparison were not done.

It might be tempting to add additional columns to the structure table

to hold defined properties of each structure. Not all properties of a struc-

ture are appropriate for a table of structures. Some properties, for example,

molecular weight and molecular formula are fixed properties of a structure

with a unique value. These might be added as columns to the structure

table. However, they could also be kept in another table related to the struc-

ture table. Consider also how often these values will be needed or if they

will be searched. It is possible to easily compute these properties when

needed, using SQL functions that take a SMILES argument.

Other properties are not unique, for example, chemical names. These

should be stored in a separate table with one row for each value. For

example, the entry in the pubchem database contains 10 synonyms for the

SMILES C1(C(C(C(C(C1O)O)OP(=O)(O)O)O)O)O as shown in Table 13.1.

Each of these should be entered as a separate row in a table of names

along with a column containing the compound id. A simple table of this

type would be created using the following SQL.

Create Table names (cid integer References structure (id), name text);

The cid column is a foreign key referencing the id c of l u m n of f t h e structure

table. This prevents any names from being entered that do not have a cor-

responding entry in the structure table. It also associates the name with the

proper structure. As shown in earlier chapters, names , and smiles can be

selected from the tables in this schema using the following SQL.

Select smi, name From structure Join names On (id=cid);

Any number of other tables can be added to this schema. Each should be

related to the structure table using the compound id . Aside from simply

registering compounds, it might be required to store experimental data

Search WWH ::

Custom Search

Home