Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

Chapter 4

Evaluation of Classification Trees

4.1 Overview

An important problem in the KDD process is the development of ecient

indicators for assessing the quality of the analysis results. In this chapter, we

introduce the main concepts and quality criteria in decision trees evaluation.

Evaluating the performance of a classification tree is a fundamental

aspect of machine learning. As stated above, the decision tree inducer

receives a training set as input and constructs a classification tree that

can classify an unseen instance. Both the classification tree and the inducer

can be evaluated using evaluation criteria. The evaluation is important

for understanding the quality of the classification tree and for refining

parameters in the KDD iterative process.

While there are several criteria for evaluating the predictive perfor-

mance of classification trees, other criteria such as the computational

complexity or the comprehensibility of the generated classifier can be

important as well.

4.2 Generalization Error

Let DT ( S ) represent a classification tree trained on dataset S .The

generalization error of DT ( S ) is its probability to misclassify an instance

selected according to the distribution D of the labeled instance space. The

classification accuracy of a classification tree is one minus the generalization

error. The training error is defined as the percentage of examples in the

training set correctly classified by the classification tree, formally:

ε ( DT ( S ) ,S )=

x,y∈S

L ( y, DT ( S )( x )) ,

(4.1)

31

Search WWH ::

Custom Search

Home