Advanced Pattern Mining - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

its core patterns. For instance, abcef can be generated by merging just two of its core

patterns, ab and cef , instead of having to merge all of its 26 core patterns.

Now, let's see how these observations can help us leap through pattern space more

directly toward colossal patterns. Consider the following scheme. First, generate a com-

plete set of frequent patterns up to a user-specified small size, and then randomly pick

a pattern,

.

will have a high probability of being a core-descendant of some colossal

pattern,

's core-descendants in this complete set, and merge them.

This generates a much larger core-descendant of

. Identify all of

, giving us the ability to leap along

a path toward

. In the same fashion we select K pat-

terns. The set of larger core-descendants generated is the candidate pool for the next

iteration.

A question arises: Given

in the core-pattern tree, T

, a core-descendant of a colossal pattern

, how can we

find the other core-descendants of

? Given two patterns,

and

, the pattern dis-

/D 1 j D \ D j

tance between them is defined as Dist

.

,

j D [ D j . Pattern distance satisfies the

triangle inequality.

For a pattern,

, let C

be the set of all its core patterns. It can be shown that C

./D 1 1

2

is bounded in metric space by a “ball” of diameter r

./

, where r

=1 . This

means that given a core pattern

's core patterns in the

current pool by posing a range query. Note that in the mining algorithm, each ran-

domly drawn pattern could be a core-descendant of more than one colossal pattern,

and as such, when merging the patterns found by the “ball,” more than one larger

core-descendant could be generated.

From this discussion, the Pattern-Fusion method is outlined in the following two

phases:

2 C

, we can identify all of

1. Initial Pool: Pattern-Fusion assumes an initial pool of small frequent patterns is

available. This is the complete set of frequent patterns up to a small size (e.g., 3).

This initial pool can be mined with any existing efficient mining algorithm.

2. Iterative Pattern-Fusion: Pattern-Fusion takes as input a user-specified parameter,

K , which is the maximum number of patterns to be mined. The mining process is

iterative. At each iteration, K seed patterns are randomly picked from the current

pool. For each of these K seeds, we find all the patterns within a ball of a size spec-

ified by

. All the patterns in each “ball” are then fused together to generate a set of

superpatterns. These superpatterns form a new pool. If the pool contains more than

K patterns, the next iteration begins with this pool for the new round of random

drawing. As the support set of every superpattern shrinks with each new iteration,

the iteration process terminates.

Note that Pattern-Fusion merges small subpatterns of a large pattern instead of

incrementally-expanding patterns with single items . This gives the method an advantage

to circumvent midsize patterns and progress on a path leading to a potential colossal

pattern. The idea is illustrated in Figure 7.10. Each point shown in the metric space

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home