Chemistry Reference
In-Depth Information
One popular fingerprint algorithm 3 produces fragments as
unbranched chains of atoms. This approach is typically called a path-based
method because the algorithm follows continuous paths of bonded atoms.
Another algorithm prefers branched fragments of each atom, creating
ever-expanding neighborhoods of atoms around each central atom. This
is typically called a circular ingerprint . 4
Regardless of the method used to fragment the structure, the hashed
fingerprint of each fragment is combined with hashed fingerprints for other
fragments from the same structure to produce an overall fingerprint for the
structure. This bit string is used in an equivalent way to the fragment keys
above to prescreen rows of structures during a substructure match. The
Appendix shows two functions to compute a fingerprint bit string.
8.4 Similarity Measures
Besides using fingerprints or fragment keys as a prescreen to speed up
substructure matches, they can be used in other ways. The bit patterns
for two molecular structures can be compared by considering bits they
have in common due to common fragments. Bits not in common are due
to fragments in one structure not appearing in the other structure. There
are many ways to combine the counts of common bits, differing bits, and
bit string length to produce a numerical measure of the similarity of one
structure to another. One popular method is called Tanimoto . 5 Given a fin-
gerprint or fragment key for structures A and B, the Tanimoto index is the
ratio of the number of bits A and B have in common to the sum of the num-
ber of bits set for A plus the number of bits set for B minus the number of
bits in common. An SQL definition for the Tanimoto index is as follows:
Create Function tanimoto(bit, bit) Returns Real As
'Select nbits_set($1 & $2)::real /
(nbits_set($1) + nbits_set($2) - nbits_set ($1 & $2))::real; '
Language SQL;
The & (logical AND) operator and the ~ (logical NOT) operator are used
along with a nonstandard SQL function nbits _ set . This function and
other related similarity functions are contained in the Appendix. The
suitability of fragment keys, path-based or circular fingerprints, for any
particular purpose is the subject of ongoing research. 6
8.5 Computing Fragment-Based Properties
The methods shown above to compute fragment keys can be extended to
compute fragment-based properties of molecules. The use of a relational
table to define the fragments makes the computation suitable to using
SQL to define the function. Rather than having the fragment parameters
Search WWH ::




Custom Search