Biomedical Engineering Reference
In-Depth Information
15.6.2.3 Fingerprint Engineering In fingerprint engineering, which represents an
emerging approach, it is attempted to modify fingerprints having a constant format
in a rational manner to further increase their search performance. For example,
complexity effects can be eliminated by merging a fingerprint with its complement
[63], which generates a version that has twice the length of the original fingerprint but
always a constant bit density of 50%. This type of fingerprint modification falls into
the spectrum of fingerprint engineering approaches. Moreover, through engineering,
segments from two or more different keyed fingerprints can be combined into new
fingerprint designs with further improved search performance compared to their
parental fingerprints [64]. For this purpose, feature selection methods are used to
identify subsets of bit positions in different fingerprints that are most important for
recognizing compounds belonging to a given activity class. Bit segments isolated
from different fingerprints are then recombined. This approach transforms generally
applicable fingerprints into compound class-specific search tools.
15.7 COMPOUND CLASSIFICATION
In addition to similarity searching, compound classification is another principal
approach to LBVS for which many conceptually diverse methods have been intro-
duced. Compound classification approaches typically do not produce compound
rankings but classify compounds as active or inactive (a process often referred to as
class label prediction in machine learning). However, there are exceptions, as dis-
cussed below. Many popular compound classification methods are machine learning
approaches. Given the increasing relevance of machine learning for LBVS, machine
learning approaches are discussed separately. First, standard compound classification
techniques are reviewed.
15.7.1 Chemical Reference Spaces
Compound classification methods operate in chemical reference spaces, typically
molecular descriptor spaces (where the selection of n numerical descriptors consti-
tutes an n -dimensional space). A principal challenge of chemical space design for
LBVS is that the descriptors selected must be activity-relevant: that is, capable of dif-
ferentiating between compounds having a desired activity and inactivemolecules. The
position of compounds in chemical reference space is determined by their descriptor
vectors (i.e., their coordinates). In general terms, the smaller the distance between
two compounds in chemical reference space, the more similar the compounds are. If
a compound is active, another one close to it should also have a high probability to
be active (provided that the space representation is indeed activity-relevant). Thus,
a general approach to identifying novel active compounds would be adding known
active reference molecules to chemical space representations into which databases
have been projected and selecting candidate compounds that closely map to these
references. Therefore, distance functions have been introduced to mine the neigh-
borhood of active reference compounds in more or less complex chemical spaces
Search WWH ::




Custom Search