Database Reference
In-Depth Information
Fig. 4.11
MARGIN
algorithm
Definition 4.16
A cut between two nodes in a pattern lattice L is defined as an
ordered pair
(
C
†
P
)
where P is the parent of C
∈
L and C is not frequent while P
is frequent. The symbol
†
is read as
cut.
The frequent node
P
of a cut is called
f-cut
. The
MARGIN
algorithm is based on
the following property which gives the intuition as to why two maximal patterns can
be reached from one to the other.
Upper Diamond Property
Any two children
C
i
,
C
j
of a node
P
in a pattern lattice
L
have a common child node
A
.
The set of candidate subgraphs that are likely to become maximally frequent are
the
f-cut
nodes. This is because they are frequent subgraphs having an infrequent
child.
MARGIN
avoids traversing the lattice bottom up and instead traverses the cuts
alone in each lattice of the given graphs. The set of all
f-cut
nodes are further pruned
to given the set of all maximal frequent subgraphs. Essentially,
MARGIN
works in
the following two main steps (Fig.
4.11
).
Stage I
Find the initial
f-cut
nodes by dropping edges one by one from the initial
graph
G
, ensuring that the resulting subgraph is connected until it finds the first
frequent subgraph
R
i
. The frequent subgraph found by such dropping of edges are
called the
Representative R
i
of
G
. Accordingly, the initial cut is thus (
CR
i
†
R
i
)
where
CR
i
is the infrequent child of
R
i
.
Stage II
For each cut discovered in
G
, an algorithm called
ExpandCut
is used to
recursively extends the cut to generate all cuts in
G
.
ExpandCut
expands a given
cut such that all its neighboring cuts will be explored.
8
Conclusion
With the increasing data size of today's real-life applications, long patterns are gain-
ing increasing recognition in a wide range of domains including bioinformatics,
social network analysis, software engineering and business intelligence. Yet the task