Classification: Advanced Methods - Data Mining: Concepts and Techniques - page 422

Databases Reference

In-Depth Information

Mi ne

Sele ct

Two-step

Data set

Frequent patterns

(a)

Discriminative patterns

Search

Transform

Direct

Data set

Compact tree

Discriminative patterns

(b)

Figure 9.13 A framework for frequent pattern-based classification: (a) a two-step general approach

versus (b) the direct approach of DDPMine.

To improve the efficiency of the general framework, consider condensing steps 1 and

2 into just one step. That is, rather than generating the complete set of frequent patterns,

it's possible to mine only the highly discriminative ones. This more direct approach

is referred to as direct discriminative pattern mining . The DDPMine algorithm follows

this approach, as illustrated in Figure 9.13(b). It first transforms the training data into

a compact tree structure known as a frequent pattern tree, or FP-tree (Section 6.2.4),

which holds all of the attribute-value (itemset) association information. It then searches

for discriminative patterns on the tree. The approach is direct in that it avoids generat-

ing a large number of indiscriminative patterns. It incrementally reduces the problem

by eliminating training tuples, thereby progressively shrinking the FP-tree. This further

speeds up the mining process.

By choosing to transform the original data to an FP-tree, DDPMine avoids gener-

ating redundant patterns because an FP-tree stores only the closed frequent patterns.

By definition, any subpattern,

, of a closed pattern,

, is redundant with respect to

(Section 6.1.2). DDPMine directly mines the discriminative patterns and integrates

feature selection into the mining framework. The theoretical upper bound on infor-

mation gain is used to facilitate a branch-and-bound search, which prunes the search

space significantly. Experimental results show that DDPMine achieves orders of mag-

nitude speedup over the two-step approach without decline in classification accuracy.

DDPMine also outperforms state-of-the-art associative classification methods in terms

of both accuracy and efficiency.

9.5 Lazy Learners (or Learning from Your Neighbors)

The classification methods discussed so far in this topic—decision tree induction,

Bayesian classification, rule-based classification, classification by backpropagation,

support vector machines, and classification based on association rule mining—are all

Next Page

Data Mining: Concepts and Techniques

Search WWH ::

Custom Search

Home