Database Reference
In-Depth Information
Chapter 4
Evaluation of Classification Trees
4.1 Overview
An important problem in the KDD process is the development of ecient
indicators for assessing the quality of the analysis results. In this chapter, we
introduce the main concepts and quality criteria in decision trees evaluation.
Evaluating the performance of a classification tree is a fundamental
aspect of machine learning. As stated above, the decision tree inducer
receives a training set as input and constructs a classification tree that
can classify an unseen instance. Both the classification tree and the inducer
can be evaluated using evaluation criteria. The evaluation is important
for understanding the quality of the classification tree and for refining
parameters in the KDD iterative process.
While there are several criteria for evaluating the predictive perfor-
mance of classification trees, other criteria such as the computational
complexity or the comprehensibility of the generated classifier can be
important as well.
4.2 Generalization Error
Let DT ( S ) represent a classification tree trained on dataset S .The
generalization error of DT ( S ) is its probability to misclassify an instance
selected according to the distribution D of the labeled instance space. The
classification accuracy of a classification tree is one minus the generalization
error. The training error is defined as the percentage of examples in the
training set correctly classified by the classification tree, formally:
ε ( DT ( S ) ,S )=
x,y∈S
L ( y, DT ( S )( x )) ,
(4.1)
31
Search WWH ::




Custom Search