VIRTUAL SCREENING METHODS - Diversity-Oriented Synthesis

Biomedical Engineering Reference

In-Depth Information

Gaussian or other polynomial kernel functions are often used in LBVS in combina-

tion with numerical property descriptors or two-dimensional fingerprints, but simple

linear kernels have also been employed successfully [85-87]. The kernel approach

is flexible and makes it possible to evaluate the similarity of objects using different

features and similarity measures. A variety of kernel functions have been introduced

for SVM modeling, including both ligand and target kernels that capture rather

different information for similarity assessment [88,89], such as graph or descrip-

tor similarity (compounds) and sequence or binding-site similarity (target proteins).

Ligand and target kernel can be combined for model building. For the assessment

of compound similarity, a Tanimoto kernel has been introduced [90] that is widely

applied in LBVS, given the popularity of the Tanimoto coefficient. A special feature

of the SVM approach is that it cannot only be used for class label predictions but

also for database ranking [87,91,92]. This is accomplished by calculating the signed

distance of test compounds from the separating hyperplane. The underlying idea is

that the farther test compounds on the active side are away from the hyperplane (and

negative training examples), the greater the likelihood of activity becomes. By con-

trast, the farther compounds are away from the hyperplane (and positive training

examples) on the inactive side, the more likely they are to be inactive. Similarity

searching has also been compared directly to SVM classification and ranking. A note-

worthy finding has been that SVM calculations using two-dimensional fingerprints

as descriptors typically produce higher compound recall and hit rates than similar-

ity searching using these fingerprints [87]. Currently, SVM learning is a preferred

approach for LBVS using two-dimensional molecular representations, which often

yields better results than other methods. Another kernel-based approach adapted

for LBVS is binary kernel discrimination [93,94], which also uses fingerprints as

descriptors and combines the kernel function methodology with likelihood estimates

of compound feature probability distributions.

15.8.3 Bayesian Methods

Statistical modeling approaches based on Bayes' theorem [82] have become very pop-

ular in LBVS. Bayesian modeling methods for activity prediction generally derive

numerical probabilities of compound activity based on feature (descriptor value)

distributions in training sets. Therefore, Bayesian methods also produce data set

rankings. In LBVS, naive Bayesian classifiers using fingerprints as descriptors are

widely used [95,96]. Following this approach, it is assumed that all features are

independent of each other (representing the naive assumption) and that numerical

descriptor values are normally distributed. These assumptions are approximations

for chemical descriptor data. Feature probabilities are derived from the frequencies

observed for positive and negative training examples. Based on calculated likeli-

hood estimates, a probability of activity is assigned to each test compound based on

its descriptor values. Despite the underlying approximations, naive Bayesian classi-

fiers have been proven to be versatile and robust machine learning approaches for

compound classification and hit identification. They have been derived for many dif-

ferent compound activity classes and have also been used for in silico profiling of

Diversity-Oriented Synthesis

Search WWH ::

Custom Search

Home