VIRTUAL SCREENING METHODS - Diversity-Oriented Synthesis

Biomedical Engineering Reference

In-Depth Information

descriptors. This setting is common for machine learning approaches. Neural net-

work simulations build descriptor-based models for class label prediction by deriving

pathways through arrays of computational neurons that best distinguish between pos-

itive and negative training examples. Once the model is built, it is used to predict the

class label (active versus inactive) of screening database compounds. However, the

model does not reveal why a compound might be predicted as active or inactive. This

remains hidden, which also applies to SOMs, a special neural network architecture

designed to map compounds from descriptor reference spaces onto a two-dimensional

neuron grid. The SOM is trained to group positive and negative training examples on

distinct regions of the map and separate them from each other. Then, test compounds

are projected onto the SOM. Because SOMs start from higher-dimensional descriptor

spaces, this approach is also a dimension reduction method. A trained SOM does not

reveal why a compound was assigned to an active region in the neuron grid, analo-

gously to other neural nets. By contrast, decision trees separate training compounds

along descriptor pathways. Each descriptor represents a decision point to divide a

learning set along the tree. Typically, a yes/no decision is made if the presence or

absence of a feature is detected, or, alternatively, it is determined whether or not

compounds fall into a specific value range of a chosen numerical descriptor. Dur-

ing training, trees are constructed that best separate active and inactive compounds

in terminal leaf nodes. A model is derived by recursively partitioning compounds

along the tree that yields a meaningful separation. If this is the case, combinations of

selected descriptors form pathways that are signatures of a given biological activity,

whereas other pathways enrich inactive compounds along the tree. Importantly, path-

ways in a decision tree are directly interpretable as feature/value range sequences

that establish classification rules, different from methods with black box character.

The tree structure so derived is then used to screen a database for active compounds.

Ensembles of independently derived decision trees, capturing different descriptors

and pathways, are often combined to yield random forest models where predictions

made independently are subjected to consensus scoring schemes [83,84]. Such ran-

dom forest models typically further increase the LBVS performance of individual

decision trees.

15.8.2 Support Vector Machines

The currently most popular machine learning approaches in LBVS are SVMs and

Bayesian classifiers. SVMs represent a class of algorithms to project training sets that

are not linearly separable into chemical reference spaces of higher dimensionality

where a separating hyperplane can be derived. Thus, SVMs are designed to depart

into opposite direction from the paradigm of low dimensionality, which provides

the basis for cell-based partitioning, as discussed. In high-dimensional space rep-

resentations, SVMs construct a maximum-margin hyperplane that yields the largest

possible distance to the nearest positive and negative training examples. A key aspect

of SVM modeling is that high-dimensional descriptor spaces are not explicitly con-

stituted and mapped. Rather, a kernel function is applied to determine the degree of

similarity between compounds in a higher-dimensional representation. For example,

Diversity-Oriented Synthesis

Search WWH ::

Custom Search

Home