Evaluation of Classification Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

have demonstrated the existence and correctness of the “conservation law”

[ Schaffer (1994) ] or “no free lunch theorem” [ Wolpert (1996) ] : if one inducer

is better than another in some domains, then there are necessarily other

domains in which this relationship is reversed.

The “no free lunch theorem” implies that for a given problem, a

certain approach can yield more information from the same data than other

approaches.

A distinction should be made between all the mathematically possible

domains, which are simply a product of the representation languages used,

and the domains that occur in the real world, and are therefore the ones

of primary interest [Rao et al . (1995)]. Without doubt there are many

domains in the former set that are not in the latter, and average accuracy

in the realworld domains can be increased at the expense of accuracy in

the domains that never occur in practice. Indeed, achieving this is the

goal of inductive learning research. It is still true that some algorithms

will match certain classes of naturally occurring domains better than other

algorithms, and so achieve higher accuracy than these algorithms. While

this may be reversed in other realworld domains, it does not preclude an

improved algorithm from being as accurate as the best in each of the domain

classes.

Indeed, in many application domains, the generalization error of even

the best methods is far above 0%, and the question of whether it can be

improved, and if so how, is an open and important one. One aspect in

answering this question is determining the minimum error achievable by

any classifier in the application domain (known as the optimal Bayes error).

If existing classifiers do not reach this level, new approaches are needed.

Although this problem has received considerable attention (see for instance

[ Tumer and Ghosh (1996) ] ), no generally reliable method has so far been

demonstrated.

The “no free lunch” concept presents a dilemma to the analyst

approaching a new task: Which inducer should be used?

If the analyst is looking for accuracy only, one solution is to try each

one in turn, and by estimating the generalization error, to choose the one

that appears to perform best [ Schaffer (1994) ] . Another approach, known as

multistrategy learning [ Michalski and Tecuci (1994) ] , attempts to combine

two or more different paradigms in a single algorithm. Most research in

this area has been concerned with combining empirical approaches with

analytical methods (see for instance [ Towell and Shavlik (1994) ] .Ideally,a

multistrategy learning algorithm would always perform as well as the best

Search WWH ::

Custom Search

Home