Information Technology Reference
In-Depth Information
11
Full Perfect Extension Pruning
for Frequent Subgraph Mining
Christian Borgelt 1 and Thorsten Meinl 2
1
European Center for Soft Computing
c/ Gonzalo Gutierrez Quiros s/n, 33600 Mieres, Spain
christian.borgelt@softcomputing.es
2
Nycomed Chair for Bioinformatics and Information Mining
Dept. of Computer and Information Science
University of Konstanz, Box M712, 78457 Konstanz, Germany
thorsten.meinl@uni-konstanz.de
Summary. Mining graph databases for frequent subgraphs has recently developed
into an area of intensive research. Its main goals are to reduce the execution time of
the existing basic algorithms and to enhance their capability to find meaningful graph
fragments. Here we present a method to achieve the former, namely an improvement of
what we called “perfect extension pruning” in an earlier paper [4]. With this method
the number of generated fragments and visited search tree nodes can be reduced, often
considerably, thus accelerating the search. We describe the method in detail and present
experimental results that demonstrate its usefulness.
11.1
Introduction
In recent years the problem how to find common subgraphs in a database of (at-
tributed) graphs, that is, subgraphs that appear with a user-specified minimum
frequency, has gained intense and still growing attention. For this task—which
has useful applications in, for example, biochemistry, web mining, and program
flow analysis—several algorithms have been proposed. Some of them rely on
principles from inductive logic programming and describe the graph structure
by logical expressions [7]. However, the vast majority transfers techniques de-
veloped originally for frequent item set mining. Examples include MolFea [11],
FSG [12], MoSS/MoFa [3] , gSpan [16], Closegraph [17], FFSM [9], and Gas-
ton [14]. A related, but slightly different approach, which is strongly geared
towards graph compression, is used in Subdue [5].
The basic idea of these approaches is to grow subgraphs into the graphs of the
database, adding an edge and maybe a node in each step, counting the number of
graphs containing each grown subgraph, and eliminating infrequent subgraphs.
Unfortunately, with this method the same subgraph can be constructed in several
ways, adding its nodes and edges in different orders. The predominant method
to avoid the ensuing redundant search is to define a canonical form of a graph
that uniquely identifies it up to automorphisms: together with a specific way
of growing the subgraphs it enables us to determine whether a given subgraph
 
Search WWH ::




Custom Search