Chemistry Reference
In-Depth Information
13.5 Utilities
In any project, there will be collections or files of compounds that need
to be processed. This chapter and previous chapters have shown ways in
which these can be usefully imported into a database. Traditionally these
files are processed in some way without being imported into a database.
There are many utility functions to carry out operations such as locating
structures within a file, finding nearest neighbors, clustering compounds,
displaying common substructures, etc. These often take the form of com-
mand line tools, or methods within a programming environment such as
python. If these tools are collected together and placed as functions in an
RDBMS, these utilities can be used from within the database. They can
also be used as command line tools, or integrated into a programming
environment. This section will show how some of these operations can be
carried out as command line utilities.
Every one of these utilities will first require that a file of structures be
loaded into a table in the database. Two methods are shown here: import-
ing a SMILES file and a mol file. Other file types could be added as needed,
extending the core functions described earlier using molfile _ mol or
molfile _ to _ smiles as a model. OpenBabel is a good choice because
of its support of many file formats.
A SMILES files is readily imported into the database using the follow-
ing perl script smiloader . The output of this command is a set of SQL
commands interspersed with lines in the input SMILES file. The file is
minimally processed. The script expects the name of a schema in which
the tables will be created. The entire perl script is shown in the Appendix.
It is used at the linux command line as follows. The schema name here is
drugs , the first argument to smiloader .
perl smiloader drugs <drugs.smi | psql mydb
The SQL output is piped to the psql command that process the commands.
The schema and tables are created in a database named mydb in this exam-
ple. If no database name is given, psql assumes a database with the same
login name as the user. The table created by smiloader contains four col-
umns: name text , id integer , isosmiles text , and fp bit varying .
Loading a SDF file is similar, although additional tables are created to
accommodate the data items in the file and to contain the original file. The
sdfloader script is used as in the following example.
perl sdfloader vla4 <vla-4.sdf | psql mydb
Figure 13.3 shows an entity relationship diagram for the tables created by
the sdfloader script. Once files have been loaded into the database, the
dbutils shell script is run to define several utility functions that oper-
ate on these tables. The dbutils script is listed in the Appendix. This
Search WWH ::




Custom Search