Database Reference
In-Depth Information
estimations, which are calculated by a cross-validation procedure. Naive
Bayes classification, by itself, is very ecient in terms of its processing
time. However, using cross-validation significantly increases the overall com-
putational complexity. Although Kohavi has used naive Bayes, to produce
the classifiers, other classification methods are also applicable. However,
due to the cross-validation estimations, NBTree becomes computationally
expensive for methods that are more time-consuming than naive Bayes (e.g.
neural networks).
We describe a simple framework for decision tree ISD, termed decision
tree framework for instance space decomposition (DFID). The framework
hierarchically partitions the instance space using a top-down (pruning-
free) decision tree procedure. Although various DFID implementations use
different stopping rules, split-validation examinations and splitting rules,
in this chapter we concentrate on a specific DFID method — contrasted
populations miner (CPOM). The splitting rule that this method uses —
grouped gain ratio — combines the well-accepted gain ratio criterion with
a heuristic grouping procedure. CPOM can reduce the processing time while
keeping the composite classifier accurate.
Implementations of DFID consist of a decision-tree (as a wrapper) and
another embedded classification method (this method can, in principle,
also be a decision tree). The embedded classification method generates the
multiple classifiers for the tree's leaves. The DFID sequence is illustrated
by the pseudo code in Figure 14.1. DFID inputs are: training instances; a
list of attributes (which will be examined as candidates for splitting the
decision tree); a classification method; and, optionally (depending on the
specific implementation), some additional parameters.
The procedure begins by creating the decision tree's root node. The
root represents the entire instance space X . When constructed, each node
is attached with a rule which defines the sub-space of X that the node
represents. The DFID framework considers rules that can be expressed
in a conjunctive normal form. A rule may be, for example: “( A 1
=3
A 1
A 2 = 1”. DFID then checks whether there should be a split
from the root node (i.e. whether X should be partitioned). This check,
which uses some stopping rules, is represented, in Figure 14.1 by the general
function StoppingCriterion. The function receives some inputs (depending
on the specific implementation) and returns a Boolean value that indicates
whether the stopping rules are met. If the stopping rules are met, then I
is trained using all of the training instances. The classifier that results is
attached to the root node and the procedure terminates. If, however, the
=4)
Search WWH ::




Custom Search