Mining Long Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

4

Pattern Enumeration Approach

The most straightforward way of mining the long patterns would be to enumerate all

the frequent patterns and report those whose sizes are sufficiently large. Depending

on the way they traverse the pattern lattice, they can be categorised into two classes:

Breadth-First Approach and Depth-First Approach.

4.1

Breadth-First Approach

The breadth-first approach traverse the pattern lattice level by level, always examining

all patterns at one level before going down to the next level. The representative

algorithms are the Apriori -based ones. The search for frequent patterns is conducted

from patterns of smaller sizes to larger ones by levels of the pattern lattice. At each

level, a new frequent pattern is discovered by joining two similar but slightly different

frequent patterns discovered at the previous level [ 1 ]. To expedite the mining for larger

patterns, some typical look-ahead technique has been proposed to identify maximal

frequent patterns without visiting every frequent patterns. In the setting of itemset

mining, an early work to adopt the look-ahead technique is [ 30 ] presenting MaxEclat

and MaxClique in which the algorithms look ahead during the initialization stage

to identify large frequent itemsets. MaxMiner [ 4 ] is an algorithm improved beyond

[ 30 ] to employ a breadth-first traversal of the pattern lattice for finding maximal

itemsets. It uses a look-ahead pruning technique throughout the search to identify

long patterns as early as possible, thus reducing database scanning, i.e., if a node

with all its extensions can be determined to be frequent, there is no need to further

process that node. Besides it also employs item re-ordering heuristic to increase

the effectiveness of superset-frequency pruning. As a result, MaxMiner is able to

achieve a performance improvement of at least an order of magnitude compared to

other look-ahead techniques. In practice, MaxMiner has demonstrated a runtime

which is roughly linear in the number of maximal frequent itemsets and the size

of the database, irrespective of the size of the largest frequent itemset, which is

significantly faster than previous Apriori-based approaches that scale exponentially

with the size of the largest pattern.

For sequential pattern mining, long patterns have been studied in a noisy envi-

ronment such as gene expression analysis in [ 26 ], where long patterns are expected

yet symbols can be misrepresented to prevent frequent patterns from being correctly

discovered. The authors proposed a sampling-based method using the well-known

Chernoff bound to estimate the ambiguous patterns whose matches in the sample

are very close to the threshold, so that there is no sufficient statistical confidence to

tell whether the pattern would be frequent or not in the entire database. In addition,

to speed up the pattern frequency verification, a technique called border collapsing

was proposed based on the observation that the set of all ambiguous patterns occupy

a contiguous portion of the pattern lattice by the Apriori property. The border of

frequent patterns can then be located efficiently by successively collapsing the gap

Search WWH ::

Custom Search

Home