Classification: Advanced Methods - Data Mining: Concepts and Techniques

Databases Reference

In-Depth Information

rules. CPAR's accuracy on numerous data sets was shown to be close to that of CMAR.

However, since CPAR generates far fewer rules than CMAR, it shows much better

efficiency with large sets of training data.

In summary, associative classification offers an alternative classification scheme by

building rules based on conjunctions of attribute-value pairs that occur frequently

in data.

9.4.2 Discriminative Frequent Pattern-Based Classification

From work on associative classification, we see that frequent patterns reflect strong asso-

ciations between attribute-value pairs (or items) in data and are useful for classification.

“ But just how discriminative are frequent patterns for classification? ” Frequent patterns

represent feature combinations. Let's compare the discriminative power of frequent pat-

terns and single features. Figure 9.11 plots the information gain of frequent patterns and

single features (i.e., of pattern length 1) for three UCI data sets. 5 The discrimination

power of some frequent patterns is higher than that of single features. Frequent patterns

map data to a higher dimensional space. They capture more underlying semantics of the

data, and thus can hold greater expressive power than single features.

“ Why not consider frequent patterns as combined features, in addition to single features

when building a classification model? ” This notion is the basis of frequent pattern-

based classification —the learning of a classification model in the feature space of single

attributes as well as frequent patterns. In this way, we transfer the original feature space

to a larger space. This will likely increase the chance of including important features.

Let's get back to our earlier question: How discriminative are frequent patterns?

Many of the frequent patterns generated in frequent itemset mining are indiscrimina-

tive because they are based solely on support, without considering predictive power.

That is, by definition, a pattern must satisfy a user-specified minimum support thresh-

old, min sup , to be considered frequent. For example, if min sup , is, say, 5%, a pattern

is frequent if it occurs in 5% of the data tuples. Consider Figure 9.12, which plots infor-

mation gain versus pattern frequency (support) for three UCI data sets. A theoretical

upper bound on information gain, which was derived analytically, is also plotted. The

figure shows that the discriminative power (assessed here as information gain) of low-

frequency patterns is bounded by a small value. This is due to the patterns' limited

coverage of the data set. Similarly, the discriminative power of very high-frequency pat-

terns is also bounded by a small value, which is due to their commonness in the data. The

upper bound of information gain is a function of pattern frequency. The information

gain upper bound increases monotonically with pattern frequency. These observations

can be confirmed analytically. Patterns with medium-large supports (e.g., support D 300

in Figure 9.12a) may be discriminative or not. Thus, not every frequent pattern is useful.

5 The University of California at Irvine (UCI) archives several large data sets at http://kdd.ics.uci.edu/ .

These are commonly used by researchers for the testing and comparison of machine learning and data

mining algorithms.

Search WWH ::

Custom Search

Home