Computational Approaches to Fragment and Substructure Discovery and Evaluation - Fragment-Based Drug Discovery

Chemistry Reference

In-Depth Information

steps is exponentially proportional to graph size, which contributes to the inefficiency of

an algorithm. Therefore, most algorithms seek ways to avoid graph/subgraph isomorphism

tests as much as possible.

Starting from an empty fragment, all possible fragment extensions (refinements) are

generated, a process that will be explained below for the simple amino acid alanine. This

is done by recursively adding edges and nodes to already generated fragments. In case

of a ring closure, only an edge is added. Generated fragments are compared against the

graphs in the database to check whether they occur. New refinements can only appear

in those graphs that already hold the original fragment. Accordingly, the algorithm keeps

appearance lists to restrict isomorphism testing to the graphs in the lists only. The support

for a fragment is the proportion, or percentage, of graphs in the database in which it

occurs. Obviously, found fragments are more relevant if they occur in at least a given

minimum number/fraction of molecules. This minimum is called the minimum support

value. Fragments are discarded if they occur in fewer molecules than allowed for the

minimum support value, which is related to the significance of the found fragments. In

general, lower minimum support values will yield higher numbers of fragments. Choosing

a sufficiently high minimum support value will result in a comprehensible number of

fragments while mining is completed within a reasonable time-scale. By definition, the

support value of a fragment never exceeds the support values of the fragments it contains.

This restricts refinement generation further, starting only from fragments with sufficient

support (cf. A priori rule [ 6 ] ). To focus isomorphism testing, fragment-mining algorithms

may keep a mapping of the nodes and edges of a fragment to the corresponding nodes and

edges of the graph in which it occurs. This is known as an embedding.

To illustrate the process, let us consider a graphmining experiment on amolecule database

with alanine (Figure 8.3). For a single molecule in the database, such as alanine, a search

tree can be constructed of all possible fragments. Figure 8.4 shows all these fragments for

alanine with hydrogen atoms omitted as discussed before. On top is an empty fragment

and each following fragment is a substructure of its descendants below. Fragments on the

same level (six in total) have the same number of bonds (edges). For instance, the first level

contains the elements N, O and C, since these are the constituents of the molecule. The C-C

fragment on the second level forms the common core for the C-C-N, C-C-C, C-C

O and

C-C-O fragment on the third level. The arrows indicate the paths leading from an empty

fragment to the complete structure, yielding one extension at a time.

=

OH

H 2 N

O

Figure 8.3 Chemical structure of alanine. Implicit hydrogens are omitted.

There are two ways to travel the subgraph lattice as the entire scheme in Figure 8.4 is

called: breadth-first and depth-first. A breadth-first search considers all refinements at the

same level before advancing to the next. For Figure 8.4, this means stepping through the

lattice one row of fragments at a time. Storage requirements are proportional to the maximal

Fragment-Based Drug Discovery

Search WWH ::

Custom Search

Home