Database Reference
In-Depth Information
- Concurrent execution of some paradigms with a posterior combination of
the individual decision each model has given to the case to classify [31]. The
combination can be done by a voting approach or by means of more complex
approaches [11].
- Hybrid approaches, in which the foundations of two or more different clas-
sification systems are implemented together in one classifier [14]. In the hy-
brid approach lies the concept of reductionism, where complex problems are
solved through stepwise decomposition [28].
In this paper, we present a new hybrid classifier based on two families of
well known classification methods; the first one is a distance based classifier [6]
and the second one is the classification tree paradigm [3] which is combined
with the former in the classification process. The k -NN algorithm is used as a
preprocessing algorithm in order to obtain a modified training database for the
posterior learning of the classification tree structure. This modified database can
lead to the induction of a tree different from the one induced according to the
original database. The two major differences are the choice of a different split
variable at some point in the tree, and the different decision about pruning at
some depth. We show the results obtained by the new approach and compare
them with the results obtained by the classification tree induction algorithm
(ID3 [23]).
The rest of the paper is organized as follows. Section 2 reviews the decision
tree paradigm, while section 3 presents the K-NN method. The new proposed
approach is presented in section 4 and results obtained are shown in section 5.
Final section is dedicated to conclusions and points out the future work.
2
Decision Trees
A decision tree consists of nodes and branches to partition a set of samples into
a set of covering decision rules. In each node, a single test or decision is made
to obtain a partition. The starting node is usually referred as the root node.
An illustration of this appears in Figure 1. In the terminal nodes or leaves a
decision is made on the class assignment. Figure 2 shows an illustrative example
of a Classification Tree obtained by the mineset software from SGI.
In each node, the main task is to select an attribute that makes the best
partition between the classes of the samples in the training set. There are many
different measures to select the best attribute in a node of the decision trees:
two works gathering these measures are [19] and [16]. In more complex works
like [21] these tests are made applying the linear discriminant approach in each
node. In the induction of a decision tree, an usual problem is the overfitting of
the tree to the training dataset, producing an excessive expansion of the tree
and consequently losing predictive accuracy to classify new unseen cases. This
problem is overcome in two ways:
- weighing the discriminant capability of the attribute selected, and thus dis-
carding a possible successive splitting of the dataset. This technique is known
as ”prepruning”.
 
Search WWH ::




Custom Search