Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

Algorithm 15 (fLeap) outlines the frequency-descending strategy. It starts with

the highest frequency threshold, and then lowers the threshold down till the objective

score of the best graph pattern converges. Line 5 executes a frequent subgraph mining

routine, fpmine, which could be FSG [ 29 ], gSpan [ 45 ] etc. fpmine selects the most

significant graph pattern g from the frequent subgraphs it mined. Line 6 implements

a simple frequency descending method.

Descending Leap Mine With structural leap search and frequency-descending min-

ing, a general mining pipeline is built for mining significant graph patterns in a

complex graph dataset. It consists of three steps as follows.

1. perform structural leap search with threshold θ

=

1 . 0, generate an optimal pattern

candidate g .

2. repeat frequency-descending mining with structural leap search until the objective

score of g converges.

3. take the best score discovered so far; perform structural leap search again (leap

length σ ) without frequency threshold; output the discovered pattern.

3.5

GraphSig: A Feature Representation Approach

Ranu and Singh [ 33 ] proposed GraphSig, a scalable method to mine significant

(measured by p value) subgraphs based on a feature vector representation of graphs.

The first step is to convert each graph into a set of feature vectors where each vector

represents a region within the graph. Prior probabilities of features are computed em-

pirically to evaluate statistical significance of patterns in the feature space. Following

the analysis in the feature space, only a small portion of the exponential search space

is accessed for further analysis. This enables the use of existing frequent subgraph

mining techniques to mine significant patterns in a scalable manner even when they

are infrequent. The major steps of GraphSig are described as follows.

Sliding Window Across Graphs As the first step, random walk with restart (abbr.

RWR) is performed on each node in a graph to simulate sliding a window across

the graph. RWR simulates the trajectory of a random walker that starts from the

target node and jumps from one node to a neighbor. Each neighbor has an equal

probability of becoming the new station of the walker. At each jump, the feature

traversed is updated which can either be an edge label or a node label. A restart

probability α brings the walker back to the starting node within approximately α

jumps. The random walk iterates till the feature distribution converges. As a result,

RWR produces a continuous distribution of features for each node where a feature

value lies in the range [0, 1], which is further discretized into ten bins. RWR can

therefore be visualized as placing a window at each node of a graph and capturing

a feature vector representation of the subgraph within it. A graph of m nodes is

represented by m feature vectors. RWR inherently takes proximity of features into

Search WWH ::

Custom Search

Home