Database Reference
In-Depth Information
typically utilize similarity values between a pair of compounds. This similar-
ity value is usually computed over a suitable descriptor-space representation 4
of chemical compounds, which is typically derived from the two-dimensional
topological molecular graph of the chemical compounds. It has been shown
that when this similarity is high, these two-dimensional descriptor-based
methods are very effective in finding compounds that share similar activity
against a biomolecular target. 2 However, the task of identifying hit compounds
is complicated by the fact that the query might have undesirable properties
such as toxicity, bad ADME (absorption, distribution, metabolism, and excre-
tion) properties, or may be promiscuous. 2 These properties will also be shared
by most of the compounds similar to the query, as they will correspond to very
similar structures. In order to overcome this problem, it is important to iden-
tify (i.e., rank high) as many chemical compounds as possible that not only
show the desired activity for the biomolecular target but also have different
structures (come from diverse chemical classes or chemotypes). Finding novel
chemotypes using the information of already known bioactive small molecules
is termed scaffold hopping . 2
We developed techniques, 21 inspired by research in social network analy-
sis, that measure the similarity between the query and a compound by taking
into account additional information beyond their direct descriptor-space-based
representation. These techniques derive indirect similarities by analyzing the
network connecting the query and the library compounds. This network is
determined using an undirected k -nearest-neighbor graph (NG) and an undi-
rected k -mutual-nearest-neighbor graph (MG). Both of these graphs contain
a node for each of the compounds as well as a node for the query. How-
ever, they differ on the set of edges that they contain. In the k -nearest-
neighbor graph there is an edge between a pair of nodes corresponding to
compounds c i and c j ,if c i is in the k -nearest-neighbor list of c j or vice-versa.
In the k -mutual-nearest-neighbor graph, an edge exists only when c i is in the
k -nearest-neighbor list of c j and c j is in the k -nearest-neighbor list of c i . The
indirect similarity between a pair of nodes is computed as the Tanimoto coe-
cient of their adjacency lists, which assigns a high similarity value to a pair of
compounds if they have a large number of common similar compounds. Thus,
the indirect similarity between a pair of compounds will be high if there are
a large number of size-two paths connecting them in the network.
The performance of indirect similarity-based retrieval strategies based
on the NG as well as MG graph was compared with direct similarity based on
the Tanimoto coecient. 21 The compounds were represented using different
descriptor-spaces (GF, ECFP, etc.). The quantitative results showed that in-
direct similarity is consistently, and in many cases substantially, better than
direct similarity. Figure 8.2 shows a part of our results in which we compare
MG-based indirect similarity to direct Tanimoto-coecient similarity search-
ing using ECFP descriptors. It can be observed from the figure that indirect
similarity outperforms direct similarity for scaffold-hopping active retrieval
in five out of six datasets (COX2, A1A, CDK2, FXa, MAO, and PDE5) on
Search WWH ::




Custom Search