Databases Reference
In-Depth Information
its core patterns. For instance, abcef can be generated by merging just two of its core
patterns, ab and cef , instead of having to merge all of its 26 core patterns.
Now, let's see how these observations can help us leap through pattern space more
directly toward colossal patterns. Consider the following scheme. First, generate a com-
plete set of frequent patterns up to a user-specified small size, and then randomly pick
a pattern,
.
will have a high probability of being a core-descendant of some colossal
pattern,
's core-descendants in this complete set, and merge them.
This generates a much larger core-descendant of
. Identify all of
, giving us the ability to leap along
a path toward
. In the same fashion we select K pat-
terns. The set of larger core-descendants generated is the candidate pool for the next
iteration.
A question arises: Given
in the core-pattern tree, T
, a core-descendant of a colossal pattern
, how can we
find the other core-descendants of
? Given two patterns,
and
, the pattern dis-
/D 1 j D \ D j
tance between them is defined as Dist
.
,
j D [ D j . Pattern distance satisfies the
triangle inequality.
For a pattern,
, let C
be the set of all its core patterns. It can be shown that C
./D 1 1
2
is bounded in metric space by a “ball” of diameter r
./
, where r
=1 . This
means that given a core pattern
's core patterns in the
current pool by posing a range query. Note that in the mining algorithm, each ran-
domly drawn pattern could be a core-descendant of more than one colossal pattern,
and as such, when merging the patterns found by the “ball,” more than one larger
core-descendant could be generated.
From this discussion, the Pattern-Fusion method is outlined in the following two
phases:
2 C
, we can identify all of
1. Initial Pool: Pattern-Fusion assumes an initial pool of small frequent patterns is
available. This is the complete set of frequent patterns up to a small size (e.g., 3).
This initial pool can be mined with any existing efficient mining algorithm.
2. Iterative Pattern-Fusion: Pattern-Fusion takes as input a user-specified parameter,
K , which is the maximum number of patterns to be mined. The mining process is
iterative. At each iteration, K seed patterns are randomly picked from the current
pool. For each of these K seeds, we find all the patterns within a ball of a size spec-
ified by
. All the patterns in each “ball” are then fused together to generate a set of
superpatterns. These superpatterns form a new pool. If the pool contains more than
K patterns, the next iteration begins with this pool for the new round of random
drawing. As the support set of every superpattern shrinks with each new iteration,
the iteration process terminates.
Note that Pattern-Fusion merges small subpatterns of a large pattern instead of
incrementally-expanding patterns with single items . This gives the method an advantage
to circumvent midsize patterns and progress on a path leading to a potential colossal
pattern. The idea is illustrated in Figure 7.10. Each point shown in the metric space
 
Search WWH ::




Custom Search