Databases Reference
In-Depth Information
Mi ne
Sele ct
Two-step
Data set
Frequent patterns
(a)
Discriminative patterns
Search
Transform
Direct
Data set
Compact tree
Discriminative patterns
(b)
Figure 9.13 A framework for frequent pattern-based classification: (a) a two-step general approach
versus (b) the direct approach of DDPMine.
To improve the efficiency of the general framework, consider condensing steps 1 and
2 into just one step. That is, rather than generating the complete set of frequent patterns,
it's possible to mine only the highly discriminative ones. This more direct approach
is referred to as direct discriminative pattern mining . The DDPMine algorithm follows
this approach, as illustrated in Figure 9.13(b). It first transforms the training data into
a compact tree structure known as a frequent pattern tree, or FP-tree (Section 6.2.4),
which holds all of the attribute-value (itemset) association information. It then searches
for discriminative patterns on the tree. The approach is direct in that it avoids generat-
ing a large number of indiscriminative patterns. It incrementally reduces the problem
by eliminating training tuples, thereby progressively shrinking the FP-tree. This further
speeds up the mining process.
By choosing to transform the original data to an FP-tree, DDPMine avoids gener-
ating redundant patterns because an FP-tree stores only the closed frequent patterns.
By definition, any subpattern,
, of a closed pattern,
, is redundant with respect to
(Section 6.1.2). DDPMine directly mines the discriminative patterns and integrates
feature selection into the mining framework. The theoretical upper bound on infor-
mation gain is used to facilitate a branch-and-bound search, which prunes the search
space significantly. Experimental results show that DDPMine achieves orders of mag-
nitude speedup over the two-step approach without decline in classification accuracy.
DDPMine also outperforms state-of-the-art associative classification methods in terms
of both accuracy and efficiency.
9.5 Lazy Learners (or Learning from Your Neighbors)
The classification methods discussed so far in this topic—decision tree induction,
Bayesian classification, rule-based classification, classification by backpropagation,
support vector machines, and classification based on association rule mining—are all
 
Search WWH ::




Custom Search