Chemistry Reference
In-Depth Information
Insert Into vla4.property (id, name, tvalue)
Select id, (molfile_properties).name, (molfile_properties).value From
(Select id, openbabel.molfile_properties(molfile) From vla4.sdf) atmp
This
insert
statement is a bit more complex than the one that inserted
rows into the
vla4.structure
table. In that table, only one row was
returned from the
molfile _ mol
function. The
molfile _ properties
function returns multiple rows for each molfile, when there are multiple
properties for each molfile. The second
select
statement above (the one
in parentheses and identified with the name
atmp
) selects all the rows for
each molfile. The first
select
statement selects all the columns from each
returned row from
molfile _ properties
. These are then inserted into
the
vla4.property
table.
Finally, the nvalue column of the vla4.property table can be populated
when possible. This column stores the numerical value of the property.
Since not all values are numerical, this column may have null entries. The
purpose is to enable efficient use of numerical data when appropriate,
for example, to select by value, sort, apply mathematical functions, etc.
The following SQL will update the
nvalue
column when possible with
a numeric value. The tilde operator in the where selects text values that
match the regular expression. The expression shown here allows integers,
decimal values, and scientific notation using E or e for the exponent, for
example 6.023E23.
Update vla4.property Set nvalue = tvalue::numeric
Where tvalue ~ E'^[+-]?[0-9]+(\\\\.[0-9]*)?([Ee][+-]?[0-9]+)?\$';
It is possible to create additional columns in the property table, for exam-
ple to contain an integer representation of the data value if that is consid-
ered necessary.
Once the data from the molfiles are extracted and loaded into the
val4.sdf
table, the table could be deleted. The
vla4.sdf
table might
also be retained as a backup of the original sdf file, easily accessible from
within the database for further processing. In either case, the
vla4.sdf
table will not create any overhead while using the
vla4.structure
or
vla4.property
tables. It will merely use space in the
vla4
schema, in
the same way that a backup copy of the original sdf file would use space
in a folder on the computer's disk.
It may seem that some information will have been lost if the original
molfile is discarded. For example, the list of atomic symbols and bonds is
not stored directly in the
vla4.structure
table. The cansmiles string,
however, does contain this information. It may be necessary for some pur-
poses to extract this information, for example, when an external program
does not read SMILES but instead requires a list of atomic symbols and
bonds. This is discussed more in the next section. A connection table can
Search WWH ::
Custom Search