Cost-sensitive Active and Proactive Learning of Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

the same cost, which is seldom the case in real world problems. For instance,

when an expert in brain tumors receives a patient who suffers a headache,

he does not recommend the Scanner as a first diagnostic test, although it is

the most effective and accurate one, because the expert has the economic

criteria in mind. Therefore the expert asks simple questions and orders

other more economic tests in order to isolate the simplest cases, and only

recommends such an expensive test for the complex ones.

Effective learning algorithms should also take into consideration cost

in the concept learning process. Most of the currently available algorithms

for classification are designed to minimize zero-one loss or error rate: the

number of incorrect predictions made or, equivalently, the probability of

making an incorrect prediction. This implicitly assumes that all errors are

equally costly. But in most KDD applications this is far from the case.

AccordingtoProvost [ Provost and Fawcett (1997) ] “ it is hard to imagine a

domain in which a learning system may be indifferent to whether it makes a

false positive or a false negative error. ” Rarely are mistakes evenly weighted

in their cost. In such cases, accuracy maximization should be replaced with

cost minimization. In real-world applications of concept learning, there are

many different types of costs involved [ Turney (1995) ] .Themajorityofthe

learning literature ignores all types of costs. The literature provides even

less guidance in situations where class distributions are imprecise or can be

changed [ Provost and Fawcett (1997) ] .

Countless research results have been published based on comparisons

of classifier accuracy over benchmark data sets. Comparing accuracies on

benchmark data sets say little, if anything, about classifier performance

on real-world tasks [ Provost and Fawcett (1998) ] . Many learning programs

create procedures whose goal is to minimize the number of errors made

when predicting the classification of unseen examples. Few papers have

investigated the cost of misclassification errors [ Provost (1994) ] and very

few papers have examined the many other types of costs.

12.2 Type of Costs

A detailed bibliography of the different types of costs can be found in

[ Turney (2000) ] . The term cost is interpreted in its broadest meaning.

Cost may be measured in many different units, such as monetary units

(dollars), temporal units (seconds), or abstract units of utility. A benefit

can be considered as negative cost. A taxonomy for costs is presented in

[ Turney (2000) ] , and it consists of the following types:

Search WWH ::

Custom Search

Home