Biomedical Engineering Reference
In-Depth Information
where p i is the relative frequency of the cyclic system i in a population of P compounds
containing a total of n distinct cyclic systems; c i corresponds to the absolute number
of molecules containing a particular cyclic system i . The values of SE range between
0 and log 2 n and hence depend on n , but not explicitly on P .IfSE
=
0, all P compounds
possess only a single cyclic system. If SE
log 2 n ,the P compounds are distributed
uniformly among the n cyclic systems (i.e., maximum cyclic system diversity on the
data set). To normalize the SE values for different values of n , the scaled SE (SSE) is
defined as
=
SE
log 2 n
SSE
=
(10.2)
The values of SSE range between 0, where all P compounds are contained in one
cyclic system, and 1.0, where each cyclic system contains an equal number of com-
pounds. Thus, SSE values closer to 1.0 indicate large scaffold diversity within the
n most populated cyclic systems. This measure is discussed extensively elsewhere
[116] and has been used recently to characterize the scaffold diversity of natural prod-
ucts in the TCM database implemented in ZINC, natural products from a commercial
vendor, approved drugs, and other compound collections [72]. As an example, Fig-
ure 10.8 shows the scaffold diversity of four compound collections using the measure
of Shannon entropy. The compound databases, analyzed above (see Figure 10.5),
includes approved drugs, natural products from a commercial vendor, diverse com-
pounds from academic groups synthesized by methods such as DOS, and commercial
compounds. The figure shows the distribution of compounds in the 10 most popu-
lated scaffolds (cyclic systems), which were computed with MEQI. Each panel also
shows the corresponding value of scaled Shannon entropy. As discussed above, SSE
values closer to 1.0 indicate that the molecules are more equally distributed in the
cyclic systems; hence, it is an indicator of large diversity. In turn, smaller SSE values
indicate that most of the molecules are distributed in a few cyclic systems, denoting
lower diversity. From the SSE values for the top 10 most populated cyclic systems in
Figure 10.8, it can be concluded that natural products and the synthetic compounds
from academic groups are the most diverse, with the largest SSE values (0.94 and
0.95, respectively). In contrast, the commercial database showed the lowest scaffold
diversity (SSE
=
0.74) according to this measure.
10.4.4 Structure Fingerprints and Multiple Representations
The chemical space of compound databases is usually compared using a single rep-
resentation. However, as discussed above, there are no unique chemical spaces since
they depend on the molecular descriptors. To address the dependence of chemical
space with representation, conclusions obtained from multiple methods (e.g., multi-
ple fingerprint representations) have been proposed. The combination or aggregation
of methods is a common practice in consensus scoring [123] for modeling receptor-
ligand interactions, data fusion for similarity searching [124], consensus activity land-
scape modeling in SAR analysis [125,126], and clustering [127]. The dependence of
Search WWH ::




Custom Search