Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

P is defined by a parameter called node affinity A ( P ), and the edge density is

defined by D ( P ). A set of nodes P is said to be ( θ , γ )-significant, if A ( P )

≥

and D ( P )

≥ γ . The dense pattern mining algorithm works in two phases. The first

phase uses min-hash to identify all possible groups of nodes satisfying the affinity

property. Specifically, the algorithm scans the graphs one by one and generates k

independent minimum hash values together with the corresponding indices for each

node. A k

N matrix is formed to store these hash values, where N is the number of

nodes in the whole graph set. Then the problem of affinity-based node pattern mining

on the original graph set can be transformed to a support-based mining problem on

the transformed table. In the second phase, a new graph fragment database is created

from the transformed table in the first phase for computing the edge density estimates

for the identified node patterns. It is also discussed how to consolidate the two phases

into a single pass, so that the mining technique can be used for a data stream.

Bifet et al. [ 6 ] studied the problem of mining frequent closed graphs on evolving

data streams. First, an incremental mining algorithm is proposed, which assumes

that the data arrives in batches of graphs. Every time a new batch of graphs arrives,

a closed subgraph mining algorithm is applied on the new batch and the set of

frequent closed graphs is updated. The second algorithm focuses on mining frequent

subgraphs with a sliding window. The difference to the incremental mining algorithm

lies in the management of the items in the sliding window. When the window is full,

the oldest batch on the sliding window is deleted. The third algorithm is an adaptive

mining one, which can adapt to the changes on the stream, maintaining only the

currently frequent closed graphs.

Bahmani et al. [ 5 ] studied the problem of mining the densest subgraph in a stream-

ing environment, and showed how to parallelize their algorithms in the MapReduce

model. They adopt the semi-streaming model, which assumes that the set of nodes is

known in advance, and the edges are streamed. For an undirected graph G

( V , E ),

= | E ( S ) |

| S |

given a subset of vertices S

, where E ( S )

is the set of edges in the induced subgraph by S . For a directed graph G

⊆

V , its density is defined as ρ ( S )

( V , E ),

= | E ( S , T ) |

given two subsets of vertices S , T

where E ( S , T ) is the set of edges in the induced subgraph by S , T . The problem of

mining the densest subgraph is to find S which maximizes ρ ( S ) in an undirected

graph, or to find S , T which maximize ρ ( S , T ) in a directed graph. Streaming al-

gorithms are proposed for finding approximately densest subgraphs. Specifically, a

⊆

V , their density is defined as ρ ( S , T )

√ | S || T |

2 )-approximation algorithm is proposed for both undirected graphs and di-

rected graphs. A (3

3 )-approximation algorithm is proposed when there is a size

constraint of k nodes on the densest subgraph. All these algorithms make O ( log n )

passes over the input graph and use O ( n ) main memory. As an example, the approxi-

mate algorithm on undirected graphs works as follows. Starting with the input graph

G , the algorithm computes the current density, ρ ( G ), and then removes all the nodes

(and their incident edges) whose degree is less than (2

ρ ( G ). If the resulting

graph is non-empty, then the algorithm recurses on the remaining graph with the node

set denoted by S , until the graph becomes empty. The node subset S which achieves

the highest density in the process is returned as the approximately densest subgraph.

2 )

Frequent Pattern Mining

Search WWH ::

Custom Search

Home