Database Reference
In-Depth Information
4
Pattern Enumeration Approach
The most straightforward way of mining the long patterns would be to enumerate all
the frequent patterns and report those whose sizes are sufficiently large. Depending
on the way they traverse the pattern lattice, they can be categorised into two classes:
Breadth-First Approach and Depth-First Approach.
4.1
Breadth-First Approach
The breadth-first approach traverse the pattern lattice level by level, always examining
all patterns at one level before going down to the next level. The representative
algorithms are the Apriori -based ones. The search for frequent patterns is conducted
from patterns of smaller sizes to larger ones by levels of the pattern lattice. At each
level, a new frequent pattern is discovered by joining two similar but slightly different
frequent patterns discovered at the previous level [ 1 ]. To expedite the mining for larger
patterns, some typical look-ahead technique has been proposed to identify maximal
frequent patterns without visiting every frequent patterns. In the setting of itemset
mining, an early work to adopt the look-ahead technique is [ 30 ] presenting MaxEclat
and MaxClique in which the algorithms look ahead during the initialization stage
to identify large frequent itemsets. MaxMiner [ 4 ] is an algorithm improved beyond
[ 30 ] to employ a breadth-first traversal of the pattern lattice for finding maximal
itemsets. It uses a look-ahead pruning technique throughout the search to identify
long patterns as early as possible, thus reducing database scanning, i.e., if a node
with all its extensions can be determined to be frequent, there is no need to further
process that node. Besides it also employs item re-ordering heuristic to increase
the effectiveness of superset-frequency pruning. As a result, MaxMiner is able to
achieve a performance improvement of at least an order of magnitude compared to
other look-ahead techniques. In practice, MaxMiner has demonstrated a runtime
which is roughly linear in the number of maximal frequent itemsets and the size
of the database, irrespective of the size of the largest frequent itemset, which is
significantly faster than previous Apriori-based approaches that scale exponentially
with the size of the largest pattern.
For sequential pattern mining, long patterns have been studied in a noisy envi-
ronment such as gene expression analysis in [ 26 ], where long patterns are expected
yet symbols can be misrepresented to prevent frequent patterns from being correctly
discovered. The authors proposed a sampling-based method using the well-known
Chernoff bound to estimate the ambiguous patterns whose matches in the sample
are very close to the threshold, so that there is no sufficient statistical confidence to
tell whether the pattern would be frequent or not in the entire database. In addition,
to speed up the pattern frequency verification, a technique called border collapsing
was proposed based on the observation that the set of all ambiguous patterns occupy
a contiguous portion of the pattern lattice by the Apriori property. The border of
frequent patterns can then be located efficiently by successively collapsing the gap
Search WWH ::




Custom Search