Advanced Pattern Mining - Data Mining: Concepts and Techniques - page 311

Databases Reference

In-Depth Information

k representative patterns that have not only high significance but also low redundancy

are called redundancy-aware top- k patterns .

Example 7.14 Redundancy-aware top- k strategy versus other top- k strategies. Figure 7.11 illus-

trates the intuition behind redundancy-aware top-k patterns versus traditional top-k

patterns and k - summarized patterns . Suppose we have the frequent patterns set shown in

Figure 7.11(a), where each circle represents a pattern of which the significance is colored

in grayscale. The distance between two circles reflects the redundancy of the two corre-

sponding patterns: The closer the circles are, the more redundant the respective patterns

are to one another. Let's say we want to find three patterns that will best represent the

given set, that is, k D 3. Which three should we choose?

Arrows are used to show the patterns chosen if using redundancy-aware top- k

patterns (Figure 7.11b), traditional top- k patterns (Figure 7.11c), or k -summarized pat-

terns (Figure 7.11d). In Figure 7.11(c), the traditional top- k strategy relies solely on

significance: It selects the three most significant patterns to represent the set.

In Figure 7.11(d), the k -summarized pattern strategy selects patterns based solely on

nonredundancy. It detects three clusters, and finds the most representative patterns to

Significance +Relevance

(a)

(b)

Significance

Relevance

(c)

(d)

Figure 7.11 Conceptual view comparing top- k methodologies (where gray levels represent pattern sig-

nificance, and the closer that two patterns are displayed, the more redundant they are to one

another): (a) original patterns, (b) redundancy-aware top- k patterns, (c) traditional top- k

patterns, and (d) k -summarized patterns.

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home