Database Reference
In-Depth Information
this complete set, and merge all of them. This would generate a much larger core-
descendant of α , giving us the ability to leap along a path toward α in the core-pattern
tree T α . In the same fashion, the algorithm picks K patterns. The set of larger
core-descendants generated would be the candidate pool for the next iteration.
Given β , which is a core-descendant of a colossal pattern α , we need to find all
the other core-descendants of α . It can be shown that two core patterns of a pattern
α exhibit proximity in the corresponding metric space.
Definition 4.15 (Pattern Distance) For patterns α and β , the pattern distance of α
and β is defined to be Dist ( α , β )
| D α D β |
| D α D β |
.
It is not hard to see that ( S , Dist ) is a metric space, where S is a set of patterns
and Dist : S
=
1
R + is defined as in Definition 4.15. This means all the pattern
distances satisfy the triangle inequality.
For two patterns β 1 , β 2
×
S
C α ,wehave Dist ( β 1 , β 2 )
r ( τ ), where r ( τ )
=
1
1
2 1 . It follows that all core patterns of a pattern α are bounded in the metric
space by a “ball” of diameter r ( τ ). This means that given one core pattern β C α ,
one can identify all of α 's core patterns in the current pool by posing a range query.
In Pattern-Fusion , each randomly picked pattern could be a core-descendant of
more than one colossal pattern, and as such, when merging the patterns found by the
“ball”, more than one larger core-descendant could be generated.
The overview of Pattern-Fusion consists of two phases.
1. Initial Pool: Pattern-Fusion assumes available an initial pool of small frequent
patterns, which is the complete set of frequent patterns up to a small size, e.g., 3.
This initial pool can be mined with any existing efficient mining algorithm.
2. Iterative Pattern Fusion: Pattern-Fusion takes as input a user-specified pa-
rameter, K , which is the maximum number of patterns to be mined. The mining
process is conducted iteratively. At each iteration, K seed patterns are randomly
picked from the current pool. For each of these K seeds, it find all the patterns
within a ball of a size specified by τ as defined in Definition 4.13. All the patterns
in each “ball” are then fused together to generate a set of super-patterns. All the
super-patterns thus generated are put together as a new pool. If this pool contains
more than K patterns, the next iteration begins with this pool for the new round
of random drawing. The termination of the iteration process is guaranteed by the
fact that the support set of every super-pattern shrinks with each new iteration.
Pattern-Fusion merges all the small subpatterns of a long pattern in one step instead
of expanding patterns with additional single items . This gives Pattern-Fusion the
advantage to circumvent mid-sized patterns and progress on a path leading to a
potential colossal pattern. The idea is illustrated in Fig. 4.9 . Each point shown in the
metric space represents a core pattern. A larger pattern has far more core patterns
close to each other, all of which would be bounded by a ball as shown in dotted line,
than a smaller pattern. Since the ball of the larger pattern is much denser, Pattern-
Fusion will hit one of its core patterns with a higher probability when performing a
random draw from the initial pattern pool.
 
Search WWH ::




Custom Search