Mining Graph Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

different definitions lead to different properties of various dense subgraph patterns,

and thus different algorithm design. While some dense subgraph patterns can be dis-

covered in polynomial time, mining other types of dense subgraphs has been proved

to be NP-complete.

5.1

Cliques and Quasi-Cliques

In an undirected graph G

V , such

that every two vertices in C are connected by an edge. C is called a maximal clique

if there exists no clique C in G such that C

=

( V , E ), a clique is a subset of vertices, C

⊆

C . A set of vertices C is an α -quasi-

⊂

clique, if the number of edges in the induced subgraph by C is no less than α | C 2 ,

where α

(0, 1).

Cheng et al. [ 10 ] proposed an external-memory algorithm, called ExtMCE, for

maximal clique enumeration from massive scale-free graphs which cannot fit into the

main memory. Given an input graph G , ExtMCE recursively computes a portion of

G at a time which can be fit into the main memory for MCE computation. To create

portions of G that can be used for MCE computation effectively, a novel concept of

H ∗ -graph is proposed, which is inspired by the h-index concept. The core part of the

H ∗ -graph is the largest set of h vertices in G that have degree at least h , called the

h-vertices . The induced subgraph of G by the h -vertices is further extended to their

neighborhood to form the H ∗ -graph. Theoretical analysis is provided to show that

the H ∗ -graph is a small portion of a large scale-free network, thus it is feasible to

perform MCE computation based on the H ∗ -graph in memory.

Given the H ∗ -graph G H ∗ , the local maximal cliques are first extracted from G H ∗ ,

based on which the global maximal cliques are identified by linking to the remaining

part of G . After that, G H ∗ is removed from G . In the following steps, another

subgraph having similar structure as G H ∗ is extracted from G for local and global

maximal clique enumeration in memory, in a similar way as described above. The

recursive steps continue until G becomes empty.

Recently, Wang et al. [ 42 ] studied redundancy-aware maximal clique computation

by generating a subset of maximal cliques as a concise and complete summary of the

whole set of maximal cliques

∈

( G ) in a graph G . A notion of visibility is introduced

to measure how well a maximal clique is covered by the subset. Then the problem is to

find a subset of maximal cliques

M

S ⊆ M

( G ), such that the visibility of each maximal

clique C

( G ) is at least τ , a user-specified parameter. A randomized algorithm

and its deterministic version are proposed for the redundancy-aware maximal clique

computation.

Xiang et al. [ 44 ] proposed an algorithm for computing the maximum clique,

which is a clique of the largest possible size in a graph G , based on the MapReduce

framework. The main idea is to recursively partition the graph into smaller, possibly

overlapping subgraphs so that each node in the MapReduce cluster can independently

compute the maximum clique for its partition. When two vertices connected by an

edge are placed in two different partitions, one of the vertices has to be replicated in

∈ M

Frequent Pattern Mining

Search WWH ::

Custom Search

Home