Database Reference
In-Depth Information
between the lower and upper borders, minimizing the expected number of scans
through the entire database.
In graph mining, AGM [ 11 ], the typical Apriori -based frequent graph mining
algorithm, generates a new frequent graph pattern candidate of size k
+
1 by joining
two size- k only if they share the same size-( k
1) subgraph, where the size here is
measured by the number of vertices. Similar approach of joining is also adopted by
FSG [ 13 ] which adopts an edge-based candidate generation scheme.
The advantage of breadth-first approach is the completeness of mining result
and minimum number of lattice node visits as a result of the Apriori -style pattern
joining in the pattern candidate generation. However, the fact that the breadth-first
approaches would exhaust pattern candidates at one level before going down to
larger ones at the next level makes reaching long patterns particularly difficult for
these algorithms. What is worse, the exponentially large number of potential patterns
of medium sizes could make the mining algorithm to get stuck and fail to find any
long pattern before draining the system memory. To overcome this issue, algorithms
have been proposed to adopt a depth-first approach as discussed below.
4.2
Depth-First Approach
Instead of enumerating all the pattern candidates of size k before exploring larger
ones, depth-first approach grows a pattern as much as possible until the frequency
threshold cannot be satisfied, adopting a depth-first style of traversal down the pattern
lattice. The advantage of such an approach is the following. First, search space
pruning could be most effective in this case. This is due to the fact that maximal
pattern mining algorithms depend on “look-ahead” technique in which an entire
subtree of an enumeration node in the lexicographic tree is pruned if the longest
pattern that can possibly be generated from that subtree is a subset of a frequent
pattern that has already been found. Clearly, the pruning will be most effective when
long or maximal patterns are found earlier in the exploration process. Indeed, a depth-
first strategy always explores the itemsets in certain canonical order (after fixing the
lexicographic ordering of items). Most of the maximal patterns are always found
earlier than their subsets in this order. For a pattern of length l , only ( l
1) of its
(immediate prefix) subsets from the 2 l possibilities are explored before discovering
the pattern. In comparison, breadth-first algorithms with level-wise exploration are
denied the chance of such pruning because they find all patterns of the same size
at any given stage of the algorithm. Second, long patterns are more likely to be
discovered by avoiding being trapped by the huge number of mid-sized ones. Third,
the depth-first exploration strategy better facilitates a memory-efficient reuse of the
counting work done at the higher levels of the enumeration tree with the use of
projected databases.
As a pioneer, DepthProject [ 2 ] finds maximal long itemsets by a depth-first search
of a lexicographic tree of itemsets, together with the look-ahead pruning technique
Search WWH ::




Custom Search