Cost-sensitive Active and Proactive Learning of Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

The altered prior requires converting a cost matrix cost ( j, i )tocost

vector C ( j ) resulting in a single quantity to represent the importance of

avoiding a particular type of error. Accurately performing this conversion is

non-trivial since it depends both on the frequency of examples of each class

as well as the frequency that an example of one class might be mistaken for

another. There are different ways to overcome the diculties of estimating

the class probability. Bayesian classifiers estimate the probability that an

example belongs to each class. Similarly, a decision tree can also produce

an estimate of the probability that an example belongs to each class given

that it is classified by a particular leaf.

N i +1

p ( y = i )=

(12.6)

j

k

k +

N j

The above equations are a few of the existing main methods for dealing with

cost. In general these methods can be divided into three main categories:

The first category is a group of methods that replace the accuracy-

based criterion (i.e., information gain) with another cost-based or with

a combination of accuracy and cost criterion. However, methods in this

category sacrifice accuracy for cost. It also requires the user to select a

particular classifier with characteristics that may not suite his domain.

The second group is stratifying methods [Provost and Fawcett (1998)],

which change the frequency of classes in the training data in proportion

to their cost. However, this approach has several shortcomings. It distorts

the distribution of examples, which may seriously affect the performance of

some algorithms. It reduces the data available for learning if stratifying is

carried out by under-sampling. It increases the learning time if it is done by

over-sampling. The third category is a group of wrapping methods, which

converts error-based classifiers into cost-sensitive ones. These methods

minimize cost while seeking to minimize zero-one loss. It also assumes

that the user has chosen a particular classifier because its characteristics

are well suited to the domain. The vast majority of the methods (of all

three categories) fail to incorporate the widely available domain knowledge

in their algorithms, which leaves many of them in the theoretical realm.

The few that do consider domain knowledge and deals with cost, take into

considerations only the misclassification cost.

The MetaCost algorithm [ Domingos (1999) ] is a wrapper method that

can be used to make any inducer, in particular decision tree inducers, to be

cost-sensitive. Specifically, MetaCost forms multiple bootstrap replicates of

Search WWH ::

Custom Search

Home