Scientific Data Analysis - Scientific Data Management

Database Reference

In-Depth Information

statistical software package R and discusses its performance in the context of

a biology problem.

8.3 Analysis of Cheminformatics Data

In this first application, we use cheminformatics as a problem domain to illus-

trate the importance of descriptors that are used to describe the objects being

considered in the pattern recognition step of the analysis. Cheminformatics is

a term used for the broad field of organizing, storing, retrieving, mining, and

analyzing vast amounts of chemical information. It has extensive applications

in the field of drug discovery and development. 2 The problems in cheminfor-

matics entail the development of well-organized chemical databases to store

chemical information like chemical structures and properties, the development

of algorithms that can operate effectively and eciently on these databases to

extract relevant information for a given problem, and the development of ef-

fective ways of visualizing the information to make informed decisions during

the drug discovery process.

A large number of public and proprietary databases such as PubChem,

ChemBank, DrugBank, ChemDB, and MDDR have been developed that or-

ganize basic chemical information such as the 2D/3D structure(s) of chemical

entities or compounds, as well as their physical properties such as molecular

weight, polarity, water solubility, and lipophilicity. 3 Many of these databases

contain advanced information such as structural descriptors, key functional

groups (hydrogen bond donors/acceptors), references to known biological tar-

gets, and references to relevant literature. 3

Once all the pertinent information is organized in a suitable database for-

mat, the next step is to develop algorithms to generate, retrieve, mine, and an-

alyze this information in the context of a particular problem in drug discovery.

The typical algorithmic tasks performed on chemical structure(s) include cal-

culation of physical and biological characteristics of chemical compounds from

first principles, searching, clustering, classification/regression, and docking. 2 , 4

Most of these algorithms operate on the assumption that the properties

and biological activity of a chemical compound are related to its structure. 2 , 4

Hansch et al. 5 demonstrated that the biological activity of a chemical com-

pound can be mathematically expressed as a function of its physiochemi-

cal properties, which lead to the development of quantitative methods for

modeling structure-activity relationships (QSAR). Since that work, many dif-

ferent approaches have been developed for building such structure-activity-

relationship (SAR) models. These models have become an essential tool for

predicting biological activity from the structural properties of a molecule.

The extensive use of cheminformatics methods has led to the development

of a number of large commercial software packages such as Daylight's Toolkit, 6

Search WWH ::

Custom Search

Home