Full Perfect Extension Pruning for Frequent Subgraph Mining - Mining Complex Data

Information Technology Reference

In-Depth Information

11

Full Perfect Extension Pruning

for Frequent Subgraph Mining

Christian Borgelt 1 and Thorsten Meinl 2

1

European Center for Soft Computing

c/ Gonzalo Gutierrez Quiros s/n, 33600 Mieres, Spain

christian.borgelt@softcomputing.es

2

Nycomed Chair for Bioinformatics and Information Mining

Dept. of Computer and Information Science

University of Konstanz, Box M712, 78457 Konstanz, Germany

thorsten.meinl@uni-konstanz.de

Summary. Mining graph databases for frequent subgraphs has recently developed

into an area of intensive research. Its main goals are to reduce the execution time of

the existing basic algorithms and to enhance their capability to find meaningful graph

fragments. Here we present a method to achieve the former, namely an improvement of

what we called “perfect extension pruning” in an earlier paper [4]. With this method

the number of generated fragments and visited search tree nodes can be reduced, often

considerably, thus accelerating the search. We describe the method in detail and present

experimental results that demonstrate its usefulness.

11.1

Introduction

In recent years the problem how to find common subgraphs in a database of (at-

tributed) graphs, that is, subgraphs that appear with a user-specified minimum

frequency, has gained intense and still growing attention. For this task—which

has useful applications in, for example, biochemistry, web mining, and program

flow analysis—several algorithms have been proposed. Some of them rely on

principles from inductive logic programming and describe the graph structure

by logical expressions [7]. However, the vast majority transfers techniques de-

veloped originally for frequent item set mining. Examples include MolFea [11],

FSG [12], MoSS/MoFa [3] , gSpan [16], Closegraph [17], FFSM [9], and Gas-

ton [14]. A related, but slightly different approach, which is strongly geared

towards graph compression, is used in Subdue [5].

The basic idea of these approaches is to grow subgraphs into the graphs of the

database, adding an edge and maybe a node in each step, counting the number of

graphs containing each grown subgraph, and eliminating infrequent subgraphs.

Unfortunately, with this method the same subgraph can be constructed in several

ways, adding its nodes and edges in different orders. The predominant method

to avoid the ensuing redundant search is to define a canonical form of a graph

that uniquely identifies it up to automorphisms: together with a specific way

of growing the subgraphs it enables us to determine whether a given subgraph

Search WWH ::

Custom Search

Home