Database Reference
In-Depth Information
statistical software package R and discusses its performance in the context of
a biology problem.
8.3 Analysis of Cheminformatics Data
In this first application, we use cheminformatics as a problem domain to illus-
trate the importance of descriptors that are used to describe the objects being
considered in the pattern recognition step of the analysis. Cheminformatics is
a term used for the broad field of organizing, storing, retrieving, mining, and
analyzing vast amounts of chemical information. It has extensive applications
in the field of drug discovery and development. 2 The problems in cheminfor-
matics entail the development of well-organized chemical databases to store
chemical information like chemical structures and properties, the development
of algorithms that can operate effectively and eciently on these databases to
extract relevant information for a given problem, and the development of ef-
fective ways of visualizing the information to make informed decisions during
the drug discovery process.
A large number of public and proprietary databases such as PubChem,
ChemBank, DrugBank, ChemDB, and MDDR have been developed that or-
ganize basic chemical information such as the 2D/3D structure(s) of chemical
entities or compounds, as well as their physical properties such as molecular
weight, polarity, water solubility, and lipophilicity. 3 Many of these databases
contain advanced information such as structural descriptors, key functional
groups (hydrogen bond donors/acceptors), references to known biological tar-
gets, and references to relevant literature. 3
Once all the pertinent information is organized in a suitable database for-
mat, the next step is to develop algorithms to generate, retrieve, mine, and an-
alyze this information in the context of a particular problem in drug discovery.
The typical algorithmic tasks performed on chemical structure(s) include cal-
culation of physical and biological characteristics of chemical compounds from
first principles, searching, clustering, classification/regression, and docking. 2 , 4
Most of these algorithms operate on the assumption that the properties
and biological activity of a chemical compound are related to its structure. 2 , 4
Hansch et al. 5 demonstrated that the biological activity of a chemical com-
pound can be mathematically expressed as a function of its physiochemi-
cal properties, which lead to the development of quantitative methods for
modeling structure-activity relationships (QSAR). Since that work, many dif-
ferent approaches have been developed for building such structure-activity-
relationship (SAR) models. These models have become an essential tool for
predicting biological activity from the structural properties of a molecule.
The extensive use of cheminformatics methods has led to the development
of a number of large commercial software packages such as Daylight's Toolkit, 6
Search WWH ::




Custom Search