Biomedical Engineering Reference
In-Depth Information
three-dimensional database searching, pharmacophore fingerprinting does not rely
on bioactive compound conformations, the prediction of which represents a major
caveat in ligand-based drug design and LBVS.
15.6.2 Two-Dimensional Fingerprints
15.6.2.1 Fingerprint Design and Comparison Two-dimensional fingerprints are
also popular descriptors and search tools for LBVS. Fingerprints of very differ-
ent design, complexity, and length have been introduced over the years, including,
among others, atom pairs [52] and fragment sets [53], topological pathways through
molecules [54], and combinatorial fingerprints [55,56]. The latter capture layered
atom environments in a compound-specific manner. In addition, other fingerprints
monitor two-dimensional pharmacophore features such as atom triplets or quadru-
plets [22], in analogy to three-dimensional pharmacophore fingerprints. Combinato-
rial fingerprints, in particular extended connectivity fingerprints [56], have become
increasingly popular in recent years and are currently among the best-performing
fingerprint descriptors [22,23]. In addition, new types of fingerprints continue to
be introduced (albeit not very frequently), such as bonded atom pairs accounting
for short-range atom environments [57] or topology fingerprints that systematically
enumerate subgraphs up to a predefined size [58]. Fingerprints are either keyed or
hashed in their design. In keyed fingerprints, each bit position is associated with a
specific feature and monitors its presence or absence (or sometimes also its count)
in a molecule. By contrast, in hashed fingerprints, features are mapped to overlap-
ping bit segments (and hence cannot be interpreted in chemical terms). Regardless
of whether two- or three-dimensional fingerprints are keyed or hashed, their overlap
is quantified using similarity metrics, the most popular being the Tanimoto coeffi-
cient [35], as mentioned earlier. Similarity coefficient values typically range from 0
to 1 and provide a basis for database ranking according to decreasing similarity to
reference compounds.
For the use of multiple reference compounds, data fusion methods that further
increase compound recall have been investigated intensely. Data fusion can princi-
pally be applied to different similarity metrics (which is not often done), different
fingerprints, or rankings obtained for multiple reference compounds (using the same
metric and fingerprint). The latter approach is most widely applied in similarity
searching and involves rank averaging or maximum rank selection [59,60].
15.6.2.2 Complexity Effect A known conundrum of fingerprint searching is the
molecular complexity effect [25,26]. Increasingly large and topologically complex
molecules have a tendency to increase the bit density of fingerprints, regardless of
their specific chemical features, which in turn causes a statistical tendency for larger
similarity values in database searching, as illustrated in Figure 15.4. This statistical
tendency favors false-positive detections and, consequently, reduces the recall of
active compounds in benchmark calculations. In practical applications, it leads to
similarity values of many database compounds that are comparable in magnitude to
those of potential hits, which increases the background noise of the calculations and
Search WWH ::




Custom Search