Chemistry Reference
In-Depth Information
structure
property
name
TEXT
name
tvalue
nvalue
TEXT
cansmiles
coord
TEXT
NUMERIC[]
INTEGER
TEXT
NUMERIC
INTEGER
PK id
*
FK id
*
atom
INTEGER[]
sdf
molfile
TEXT
PF id
*
INTEGER
Figure 11.1
Entity relationship diagram for VLA4 schema.
example, SDF files were obtained from QSAR world,
3
a Web resource that
curates dozens of data sets used in quantitative structure activity relation-
ship (QSAR) studies. The VLA-4
4
Integrin antagonists were selected. This
file contains structures and data for 94 compounds.
5
One way to organize tables in a database is to define a new schema to
contain related tables. Here, we will create a schema name
vla4
. Using an
expansion of the example from the previous chapter, the following three
tables are suggested as a starting point. The entity relationship diagram
in Figure 11.1 illustrates the
vla4
schema.
Create Schema vla4;
Create Table vla4.sdf (id Integer, molfile Text);
Create Table vla4.structure (id Integer, name Text, cansmiles Text,
coord Float[][3], atom Integer[]);
Create Table vla4.property (id Integer, name Text, tvalue Text,
nvalue Numeric);
The column
structure.id
is a unique integer relating the structure,
sdf and property tables. The
sdf.molfile
column contains the mol-
file for each structure as defined by the vendor. The
structure.name
and
structure.cansmiles
columns contain the name and canonical
smiles parsed and computed from the molfile. The
structure.coord
column will contain an array of atomic coordinates. The
structure.
atom
column will contain an array of atom numbers from the file in
canonical order to correspond to the atom order in the canonical SMILES.
The OpenBabel/plpythonu extension functions
molfile _ mol
and
molfile _ properties
will be used to parse the vendor SDF molfiles
and populate these tables. The
molfile
column of the sdf table is first
populated from the SDF file, using the following perl script.
Search WWH ::
Custom Search