Databases Reference
In-Depth Information
rules. CPAR's accuracy on numerous data sets was shown to be close to that of CMAR.
However, since CPAR generates far fewer rules than CMAR, it shows much better
efficiency with large sets of training data.
In summary, associative classification offers an alternative classification scheme by
building rules based on conjunctions of attribute-value pairs that occur frequently
in data.
9.4.2 Discriminative Frequent Pattern-Based Classification
From work on associative classification, we see that frequent patterns reflect strong asso-
ciations between attribute-value pairs (or items) in data and are useful for classification.
But just how discriminative are frequent patterns for classification? ” Frequent patterns
represent feature combinations. Let's compare the discriminative power of frequent pat-
terns and single features. Figure 9.11 plots the information gain of frequent patterns and
single features (i.e., of pattern length 1) for three UCI data sets. 5 The discrimination
power of some frequent patterns is higher than that of single features. Frequent patterns
map data to a higher dimensional space. They capture more underlying semantics of the
data, and thus can hold greater expressive power than single features.
Why not consider frequent patterns as combined features, in addition to single features
when building a classification model? ” This notion is the basis of frequent pattern-
based classification —the learning of a classification model in the feature space of single
attributes as well as frequent patterns. In this way, we transfer the original feature space
to a larger space. This will likely increase the chance of including important features.
Let's get back to our earlier question: How discriminative are frequent patterns?
Many of the frequent patterns generated in frequent itemset mining are indiscrimina-
tive because they are based solely on support, without considering predictive power.
That is, by definition, a pattern must satisfy a user-specified minimum support thresh-
old, min sup , to be considered frequent. For example, if min sup , is, say, 5%, a pattern
is frequent if it occurs in 5% of the data tuples. Consider Figure 9.12, which plots infor-
mation gain versus pattern frequency (support) for three UCI data sets. A theoretical
upper bound on information gain, which was derived analytically, is also plotted. The
figure shows that the discriminative power (assessed here as information gain) of low-
frequency patterns is bounded by a small value. This is due to the patterns' limited
coverage of the data set. Similarly, the discriminative power of very high-frequency pat-
terns is also bounded by a small value, which is due to their commonness in the data. The
upper bound of information gain is a function of pattern frequency. The information
gain upper bound increases monotonically with pattern frequency. These observations
can be confirmed analytically. Patterns with medium-large supports (e.g., support D 300
in Figure 9.12a) may be discriminative or not. Thus, not every frequent pattern is useful.
5 The University of California at Irvine (UCI) archives several large data sets at http://kdd.ics.uci.edu/ .
These are commonly used by researchers for the testing and comparison of machine learning and data
mining algorithms.
 
Search WWH ::




Custom Search