Biomedical Engineering Reference
In-Depth Information
Gaussian or other polynomial kernel functions are often used in LBVS in combina-
tion with numerical property descriptors or two-dimensional fingerprints, but simple
linear kernels have also been employed successfully [85-87]. The kernel approach
is flexible and makes it possible to evaluate the similarity of objects using different
features and similarity measures. A variety of kernel functions have been introduced
for SVM modeling, including both ligand and target kernels that capture rather
different information for similarity assessment [88,89], such as graph or descrip-
tor similarity (compounds) and sequence or binding-site similarity (target proteins).
Ligand and target kernel can be combined for model building. For the assessment
of compound similarity, a Tanimoto kernel has been introduced [90] that is widely
applied in LBVS, given the popularity of the Tanimoto coefficient. A special feature
of the SVM approach is that it cannot only be used for class label predictions but
also for database ranking [87,91,92]. This is accomplished by calculating the signed
distance of test compounds from the separating hyperplane. The underlying idea is
that the farther test compounds on the active side are away from the hyperplane (and
negative training examples), the greater the likelihood of activity becomes. By con-
trast, the farther compounds are away from the hyperplane (and positive training
examples) on the inactive side, the more likely they are to be inactive. Similarity
searching has also been compared directly to SVM classification and ranking. A note-
worthy finding has been that SVM calculations using two-dimensional fingerprints
as descriptors typically produce higher compound recall and hit rates than similar-
ity searching using these fingerprints [87]. Currently, SVM learning is a preferred
approach for LBVS using two-dimensional molecular representations, which often
yields better results than other methods. Another kernel-based approach adapted
for LBVS is binary kernel discrimination [93,94], which also uses fingerprints as
descriptors and combines the kernel function methodology with likelihood estimates
of compound feature probability distributions.
15.8.3 Bayesian Methods
Statistical modeling approaches based on Bayes' theorem [82] have become very pop-
ular in LBVS. Bayesian modeling methods for activity prediction generally derive
numerical probabilities of compound activity based on feature (descriptor value)
distributions in training sets. Therefore, Bayesian methods also produce data set
rankings. In LBVS, naive Bayesian classifiers using fingerprints as descriptors are
widely used [95,96]. Following this approach, it is assumed that all features are
independent of each other (representing the naive assumption) and that numerical
descriptor values are normally distributed. These assumptions are approximations
for chemical descriptor data. Feature probabilities are derived from the frequencies
observed for positive and negative training examples. Based on calculated likeli-
hood estimates, a probability of activity is assigned to each test compound based on
its descriptor values. Despite the underlying approximations, naive Bayesian classi-
fiers have been proven to be versatile and robust machine learning approaches for
compound classification and hit identification. They have been derived for many dif-
ferent compound activity classes and have also been used for in silico profiling of
Search WWH ::




Custom Search