Database Reference
In-Depth Information
P is defined by a parameter called node affinity A ( P ), and the edge density is
defined by D ( P ). A set of nodes P is said to be ( θ , γ )-significant, if A ( P )
θ
and D ( P )
γ . The dense pattern mining algorithm works in two phases. The first
phase uses min-hash to identify all possible groups of nodes satisfying the affinity
property. Specifically, the algorithm scans the graphs one by one and generates k
independent minimum hash values together with the corresponding indices for each
node. A k
N matrix is formed to store these hash values, where N is the number of
nodes in the whole graph set. Then the problem of affinity-based node pattern mining
on the original graph set can be transformed to a support-based mining problem on
the transformed table. In the second phase, a new graph fragment database is created
from the transformed table in the first phase for computing the edge density estimates
for the identified node patterns. It is also discussed how to consolidate the two phases
into a single pass, so that the mining technique can be used for a data stream.
Bifet et al. [ 6 ] studied the problem of mining frequent closed graphs on evolving
data streams. First, an incremental mining algorithm is proposed, which assumes
that the data arrives in batches of graphs. Every time a new batch of graphs arrives,
a closed subgraph mining algorithm is applied on the new batch and the set of
frequent closed graphs is updated. The second algorithm focuses on mining frequent
subgraphs with a sliding window. The difference to the incremental mining algorithm
lies in the management of the items in the sliding window. When the window is full,
the oldest batch on the sliding window is deleted. The third algorithm is an adaptive
mining one, which can adapt to the changes on the stream, maintaining only the
currently frequent closed graphs.
Bahmani et al. [ 5 ] studied the problem of mining the densest subgraph in a stream-
ing environment, and showed how to parallelize their algorithms in the MapReduce
model. They adopt the semi-streaming model, which assumes that the set of nodes is
known in advance, and the edges are streamed. For an undirected graph G
×
=
( V , E ),
= | E ( S ) |
| S |
given a subset of vertices S
, where E ( S )
is the set of edges in the induced subgraph by S . For a directed graph G
V , its density is defined as ρ ( S )
=
( V , E ),
= | E ( S , T ) |
given two subsets of vertices S , T
,
where E ( S , T ) is the set of edges in the induced subgraph by S , T . The problem of
mining the densest subgraph is to find S which maximizes ρ ( S ) in an undirected
graph, or to find S , T which maximize ρ ( S , T ) in a directed graph. Streaming al-
gorithms are proposed for finding approximately densest subgraphs. Specifically, a
(2
V , their density is defined as ρ ( S , T )
| S || T |
2 )-approximation algorithm is proposed for both undirected graphs and di-
rected graphs. A (3
+
3 )-approximation algorithm is proposed when there is a size
constraint of k nodes on the densest subgraph. All these algorithms make O ( log n )
passes over the input graph and use O ( n ) main memory. As an example, the approxi-
mate algorithm on undirected graphs works as follows. Starting with the input graph
G , the algorithm computes the current density, ρ ( G ), and then removes all the nodes
(and their incident edges) whose degree is less than (2
+
ρ ( G ). If the resulting
graph is non-empty, then the algorithm recurses on the remaining graph with the node
set denoted by S , until the graph becomes empty. The node subset S which achieves
the highest density in the process is returned as the approximately densest subgraph.
+
2 )
·
Search WWH ::




Custom Search