Database Reference
In-Depth Information
Algorithm 15 (fLeap) outlines the frequency-descending strategy. It starts with
the highest frequency threshold, and then lowers the threshold down till the objective
score of the best graph pattern converges. Line 5 executes a frequent subgraph mining
routine, fpmine, which could be FSG [ 29 ], gSpan [ 45 ] etc. fpmine selects the most
significant graph pattern g from the frequent subgraphs it mined. Line 6 implements
a simple frequency descending method.
Descending Leap Mine With structural leap search and frequency-descending min-
ing, a general mining pipeline is built for mining significant graph patterns in a
complex graph dataset. It consists of three steps as follows.
1. perform structural leap search with threshold θ
=
1 . 0, generate an optimal pattern
candidate g .
2. repeat frequency-descending mining with structural leap search until the objective
score of g converges.
3. take the best score discovered so far; perform structural leap search again (leap
length σ ) without frequency threshold; output the discovered pattern.
3.5
GraphSig: A Feature Representation Approach
Ranu and Singh [ 33 ] proposed GraphSig, a scalable method to mine significant
(measured by p value) subgraphs based on a feature vector representation of graphs.
The first step is to convert each graph into a set of feature vectors where each vector
represents a region within the graph. Prior probabilities of features are computed em-
pirically to evaluate statistical significance of patterns in the feature space. Following
the analysis in the feature space, only a small portion of the exponential search space
is accessed for further analysis. This enables the use of existing frequent subgraph
mining techniques to mine significant patterns in a scalable manner even when they
are infrequent. The major steps of GraphSig are described as follows.
Sliding Window Across Graphs As the first step, random walk with restart (abbr.
RWR) is performed on each node in a graph to simulate sliding a window across
the graph. RWR simulates the trajectory of a random walker that starts from the
target node and jumps from one node to a neighbor. Each neighbor has an equal
probability of becoming the new station of the walker. At each jump, the feature
traversed is updated which can either be an edge label or a node label. A restart
probability α brings the walker back to the starting node within approximately α
jumps. The random walk iterates till the feature distribution converges. As a result,
RWR produces a continuous distribution of features for each node where a feature
value lies in the range [0, 1], which is further discretized into ten bins. RWR can
therefore be visualized as placing a window at each node of a graph and capturing
a feature vector representation of the subgraph within it. A graph of m nodes is
represented by m feature vectors. RWR inherently takes proximity of features into
Search WWH ::




Custom Search