Database Reference
In-Depth Information
Fig. 4.11 MARGIN
algorithm
Definition 4.16 A cut between two nodes in a pattern lattice L is defined as an
ordered pair ( C P ) where P is the parent of C
L and C is not frequent while P
is frequent. The symbol is read as cut.
The frequent node P of a cut is called f-cut . The MARGIN algorithm is based on
the following property which gives the intuition as to why two maximal patterns can
be reached from one to the other.
Upper Diamond Property Any two children C i , C j of a node P in a pattern lattice
L have a common child node A .
The set of candidate subgraphs that are likely to become maximally frequent are
the f-cut nodes. This is because they are frequent subgraphs having an infrequent
child. MARGIN avoids traversing the lattice bottom up and instead traverses the cuts
alone in each lattice of the given graphs. The set of all f-cut nodes are further pruned
to given the set of all maximal frequent subgraphs. Essentially, MARGIN works in
the following two main steps (Fig. 4.11 ).
Stage I Find the initial f-cut nodes by dropping edges one by one from the initial
graph G , ensuring that the resulting subgraph is connected until it finds the first
frequent subgraph R i . The frequent subgraph found by such dropping of edges are
called the Representative R i of G . Accordingly, the initial cut is thus ( CR i R i )
where CR i is the infrequent child of R i .
Stage II For each cut discovered in G , an algorithm called ExpandCut is used to
recursively extends the cut to generate all cuts in G . ExpandCut expands a given
cut such that all its neighboring cuts will be explored.
8
Conclusion
With the increasing data size of today's real-life applications, long patterns are gain-
ing increasing recognition in a wide range of domains including bioinformatics,
social network analysis, software engineering and business intelligence. Yet the task
Search WWH ::




Custom Search