Mining Long Patterns - Frequent Pattern Mining

Database Reference

In-Depth Information

this complete set, and merge all of them. This would generate a much larger core-

descendant of α , giving us the ability to leap along a path toward α in the core-pattern

tree T α . In the same fashion, the algorithm picks K patterns. The set of larger

core-descendants generated would be the candidate pool for the next iteration.

Given β , which is a core-descendant of a colossal pattern α , we need to find all

the other core-descendants of α . It can be shown that two core patterns of a pattern

α exhibit proximity in the corresponding metric space.

Definition 4.15 (Pattern Distance) For patterns α and β , the pattern distance of α

and β is defined to be Dist ( α , β )

− | D α ∩ D β |

| D α ∪ D β |

It is not hard to see that ( S , Dist ) is a metric space, where S is a set of patterns

and Dist : S

R + is defined as in Definition 4.15. This means all the pattern

distances satisfy the triangle inequality.

For two patterns β 1 , β 2

→

∈

C α ,wehave Dist ( β 1 , β 2 )

≤

r ( τ ), where r ( τ )

2 /τ − 1 . It follows that all core patterns of a pattern α are bounded in the metric

space by a “ball” of diameter r ( τ ). This means that given one core pattern β ∈ C α ,

one can identify all of α 's core patterns in the current pool by posing a range query.

In Pattern-Fusion , each randomly picked pattern could be a core-descendant of

more than one colossal pattern, and as such, when merging the patterns found by the

“ball”, more than one larger core-descendant could be generated.

The overview of Pattern-Fusion consists of two phases.

−

1. Initial Pool: Pattern-Fusion assumes available an initial pool of small frequent

patterns, which is the complete set of frequent patterns up to a small size, e.g., 3.

This initial pool can be mined with any existing efficient mining algorithm.

2. Iterative Pattern Fusion: Pattern-Fusion takes as input a user-specified pa-

rameter, K , which is the maximum number of patterns to be mined. The mining

process is conducted iteratively. At each iteration, K seed patterns are randomly

picked from the current pool. For each of these K seeds, it find all the patterns

within a ball of a size specified by τ as defined in Definition 4.13. All the patterns

in each “ball” are then fused together to generate a set of super-patterns. All the

super-patterns thus generated are put together as a new pool. If this pool contains

more than K patterns, the next iteration begins with this pool for the new round

of random drawing. The termination of the iteration process is guaranteed by the

fact that the support set of every super-pattern shrinks with each new iteration.

Pattern-Fusion merges all the small subpatterns of a long pattern in one step instead

of expanding patterns with additional single items . This gives Pattern-Fusion the

advantage to circumvent mid-sized patterns and progress on a path leading to a

potential colossal pattern. The idea is illustrated in Fig. 4.9 . Each point shown in the

metric space represents a core pattern. A larger pattern has far more core patterns

close to each other, all of which would be bounded by a ball as shown in dotted line,

than a smaller pattern. Since the ball of the larger pattern is much denser, Pattern-

Fusion will hit one of its core patterns with a higher probability when performing a

random draw from the initial pattern pool.

Frequent Pattern Mining

Search WWH ::

Custom Search

Home