Chemistry Reference
In-Depth Information
O
O
NH
N
H
O
OH
H 2 N
O
O
O
Figure 8.1 Example structure taken from the PubChem compound database. [1] IUPAC name
4-{(2 S )-2-acetamido-2-[[(2 S )-10-carbamoyl-9-(cyclohexylmethoxy)-2-bicyclo[5.4.0]undeca-
7,9,11-trienyl]carbamoyl]ethyl}-2-formylbenzoic acid, PubChem CID 9959891.
Figure 8.2 Graph representation of the example structure in Figure 8.1. Nodes (black dots)
represent the atoms and edges (solid lines) represent the bonds. Note that standard graph
representation disregards any extra information such atom type or bond order.
8.2.2
Substructure Methods
Frequent subgraph mining. Graph-based data mining aims to find interesting patterns in
graph data. It has a variety of applications, such as analysis of literature citation networks,
weblogs andweb searches. Frequent subgraphmining is the process of finding all frequently
recurring topological patterns in a database. In drug discovery, frequent subgraph mining,
or fragment mining in the context of molecular databases, can be used to find structural
patterns that are frequent in one class of compounds and infrequent in another. First, the
general procedure of subgraph mining will be described. After that, a number of algorithms
and tools for molecular fragment mining will be presented.
To find the frequently occurring fragments in a set of graphs, a typical algorithm would
enumerate all possible fragments that exist in the set and find for each fragment the graphs
in which it occurs. The frequency of a fragment is the number of graphs in which it occurs.
The process of testing whether a fragment is part of a graph is called subgraph isomorphism
testing. It searches the graph for a subgraph that is isomorphic to the fragment. A typical
example is the ethyl fragment (C-C) in n -propane (C-C-C); it occurs twice and the one is
'isomorphic' to the other. In terms of computing steps, graph/subgraph isomorphism tests
are relatively costly. This translates to prolonged computing time or memory requirements.
It is one of the key issues in graph mining since there currently exist no efficient algorithms
for isomorphism testing on general graphs. In the worst case, the number of computing
Search WWH ::




Custom Search