Chemistry Reference
In-Depth Information
understanding of the data values and relationships among them is nec-
essary. Often the data will be made available in flat files with little or no
structure. It is important to review these files and create a schema of tables
appropriate for these data. Chapter 6 showed how this might be done for
data obtained from the PubChem project. 2 The concepts of unique com-
pound ids and sample ids was used there, along with a separate schema
for assay results containing references to sample ids. Chapter 11 showed
an example of importing data on VLA-4 3 Integrin antagonists with a sim-
pler data organization. In both cases, these were imported into new schema
designed to it those data. It may be desirable to integrate newly imported
data into existing schemas in the database. For example, if the new data
contains information about compounds already in the registry tables, that
relationship should be recorded. Similarly, if the imported data contains
compounds not already in the registry, they could be added to the registry.
The following SQL function returns a compound id that can be used
when processing data to be imported from an external source. The func-
tion returns the structure.id of an existing compound, or creates a new
entry in the structure table for a new compound if necessary and returns
that new structure.id .
Create Or Replace Function registry.compound_id (Text) Returns
Integer As $EOSQL $
Declare
cid Integer;
Begin
Select id Into cid from structure Where smi = isosmiles($1);
If cid Is Null Then
Insert into structure (smi) Values ($1);
Select id Into cid From structure Where smi = isosmiles($1);
End If;
Return cid;
End;
$EOSQL$ Language plpgsql;
This function is placed into the registry schema and would be called as
in the following example.
Select registry.compound_id('SCCS');
As compounds are processed from the external source, say from an sdf
file, each value of the SMILES would be passed to the compound _ id
function and the resulting id would be used in the column of the new
table that references the registry.structure id column.
As compounds are added to the registry from various sources, it may
be necessary to record where these compounds came from. This intro-
duces another generally useful feature in a registry of compounds. An
Search WWH ::




Custom Search