Databases Reference
In-Depth Information
k
representative patterns that have not only high significance but also low redundancy
are called
redundancy-aware top-
k
patterns
.
Example 7.14
Redundancy-aware top-
k
strategy versus other top-
k
strategies.
Figure 7.11 illus-
trates the intuition behind
redundancy-aware top-k patterns
versus
traditional top-k
patterns
and
k
-
summarized patterns
. Suppose we have the frequent patterns set shown in
Figure 7.11(a), where each circle represents a pattern of which the significance is colored
in grayscale. The distance between two circles reflects the redundancy of the two corre-
sponding patterns: The closer the circles are, the more redundant the respective patterns
are to one another. Let's say we want to find three patterns that will best represent the
given set, that is,
k
D 3. Which three should we choose?
Arrows are used to show the patterns chosen if using redundancy-aware top-
k
patterns (Figure 7.11b), traditional top-
k
patterns (Figure 7.11c), or
k
-summarized pat-
terns (Figure 7.11d). In Figure 7.11(c), the
traditional top-
k
strategy
relies solely on
significance: It selects the three most significant patterns to represent the set.
In Figure 7.11(d), the
k
-summarized pattern strategy
selects patterns based solely on
nonredundancy. It detects three clusters, and finds the most representative patterns to
Significance +Relevance
(a)
(b)
Significance
Relevance
(c)
(d)
Figure 7.11
Conceptual view comparing top-
k
methodologies (where gray levels represent pattern sig-
nificance, and the closer that two patterns are displayed, the more redundant they are to one
another): (a) original patterns, (b) redundancy-aware top-
k
patterns, (c) traditional top-
k
patterns, and (d)
k
-summarized patterns.