Mining Long Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

between the lower and upper borders, minimizing the expected number of scans

through the entire database.

In graph mining, AGM [ 11 ], the typical Apriori -based frequent graph mining

algorithm, generates a new frequent graph pattern candidate of size k

+

1 by joining

two size- k only if they share the same size-( k

1) subgraph, where the size here is

measured by the number of vertices. Similar approach of joining is also adopted by

FSG [ 13 ] which adopts an edge-based candidate generation scheme.

The advantage of breadth-first approach is the completeness of mining result

and minimum number of lattice node visits as a result of the Apriori -style pattern

joining in the pattern candidate generation. However, the fact that the breadth-first

approaches would exhaust pattern candidates at one level before going down to

larger ones at the next level makes reaching long patterns particularly difficult for

these algorithms. What is worse, the exponentially large number of potential patterns

of medium sizes could make the mining algorithm to get stuck and fail to find any

long pattern before draining the system memory. To overcome this issue, algorithms

have been proposed to adopt a depth-first approach as discussed below.

−

4.2

Depth-First Approach

Instead of enumerating all the pattern candidates of size k before exploring larger

ones, depth-first approach grows a pattern as much as possible until the frequency

threshold cannot be satisfied, adopting a depth-first style of traversal down the pattern

lattice. The advantage of such an approach is the following. First, search space

pruning could be most effective in this case. This is due to the fact that maximal

pattern mining algorithms depend on “look-ahead” technique in which an entire

subtree of an enumeration node in the lexicographic tree is pruned if the longest

pattern that can possibly be generated from that subtree is a subset of a frequent

pattern that has already been found. Clearly, the pruning will be most effective when

long or maximal patterns are found earlier in the exploration process. Indeed, a depth-

first strategy always explores the itemsets in certain canonical order (after fixing the

lexicographic ordering of items). Most of the maximal patterns are always found

earlier than their subsets in this order. For a pattern of length l , only ( l

1) of its

(immediate prefix) subsets from the 2 l possibilities are explored before discovering

the pattern. In comparison, breadth-first algorithms with level-wise exploration are

denied the chance of such pruning because they find all patterns of the same size

at any given stage of the algorithm. Second, long patterns are more likely to be

discovered by avoiding being trapped by the huge number of mid-sized ones. Third,

the depth-first exploration strategy better facilitates a memory-efficient reuse of the

counting work done at the higher levels of the enumeration tree with the use of

projected databases.

As a pioneer, DepthProject [ 2 ] finds maximal long itemsets by a depth-first search

of a lexicographic tree of itemsets, together with the look-ahead pruning technique

−

Frequent Pattern Mining

Search WWH ::

Custom Search

Home