Database Reference
In-Depth Information
30
12
Tanimoto
Tanimoto
MG
MG
25
10
20
8
15
6
10
4
5
2
0
0
COX2
A1A CDK2
FXa
MAO PDE5
COX2
A1A
CDK2
FXa
MAO PDE5
Figure 8.2 Performance of indirect similarity measures (MG) as compared
to similarity searching using the Tanimoto coecient.
which we tested our method. It can also be observed that indirect similarity
outperforms direct similarity for active compound retrieval in all datasets ex-
cept MAO. Moreover, the relative gains achieved by indirect similarity for the
task of identifying active compounds with different scaffolds is much higher,
indicating that it performs well in identifying compounds that have similar
biomolecule activity even when their direct similarity is low.
8.3.4 Classification Algorithms for Chemical Compounds
We have developed structure-based prediction models for classifying com-
pounds into various classes of interest (e.g., active/inactive, toxic/non-toxic).
These models were based on support vector machines (SVMs) and uti-
lized novel kernel functions to determine the similarity between a pair of
compounds. 11 , 18 These kernel functions were developed by representing the
structure of the compound as a vector in a high-dimensional descriptor
space whose dimensions corresponded to two- or three-dimensional structures
present in the compounds. The descriptor spaces that we developed and stud-
ied include the FS 18 and GF descriptors 11 and a descriptor based on frequently
occurring geometric subgraphs that were discovered automatically from the
predicted three-dimensional structure of the compounds. 22 We studied the
performance of different kernel functions including linear, radial basis func-
tions, and Tanimoto. 11 , 18 We also developed novel extensions to existing kernel
functions for the GF descriptors that take into account the size of the different
descriptors. This function calculates a different similarity value for fragments
belonging to each of the different sizes and then combines them to yield a sin-
gle similarity value. This approach leads to better results for GF descriptors
in the context of SVM. 11 Although SVM is quite effective for high-dimensional
datasets, the selection of the most relevant features has been shown to yield
Search WWH ::




Custom Search