Database Reference
In-Depth Information
superior results in practice. We have developed feature selection techniques for
the problem of chemical compound classification in the context of FS descrip-
tors that take into account the class distribution of a feature (belonging to a
predominantly active or inactive set of compounds) and their prevalence (sup-
port) in the dataset. Our results showed that using selected features derived
from this technique leads to better models for SVM-based classification. 18
Overall, the outcome of these studies has been that support vector machines
are a powerful and flexible methodology for building predictive models and
lead to models that often outperform other supervised learning approaches.
8.3.5 Future Direction of Cheminformatics Data Analysis
Mining and retrieving data for a single biomolecular target, and building SAR
models on it, has been traditionally used to analyze the structure-activity re-
lationships, which play a key role in drug discovery. However, in recent years
the widespread use of high-throughput screening (HTS) technologies by the
pharmaceutical industry has generated a wealth of protein-ligand activity data
for large compound libraries. These data have been systematically collected
and stored in centralized databases. 23 At the same time, the completion of
the human genome sequencing project has provided a large number of “drug-
gable” protein targets 24 that can be used for therapeutic purposes. Addition-
ally, a large fraction of the protein targets that have been or are currently
being investigated for therapeutic purposes belong to a small number of gene
families. 25 The combination of these three factors has led to the development
of methods that utilize information that goes beyond the traditional single
biomolecular target data analysis. Recently, the trend has been to integrate
cheminformatics data with protein and genetic data (bioinformatics data) and
analyze the problem over multiple proteins or different protein families. Vari-
ous approaches such as the identification of active compounds for a given new
target within the same family (chemogenomics), 26 discovering new biology
(chemical genetics), 27 discovering new targets for well-characterized chemical
compound(s) (target fishing), 28 and establishing promiscuity or selectivity of
chemical compound(s) (poly-pharmacology) 23 are being developed to aid the
drug discovery process.
8.4 Computational Prediction of Protein Function:
Survey and Enhancements
In our second example, we consider a different application domain, namely,
bioinformatics, where we focus on the important problem of predicting the
function of proteins. We also discuss the role that noise can play in our ability
to identify the patterns in the data.
Search WWH ::




Custom Search