Database Reference
In-Depth Information
The altered prior requires converting a cost matrix cost ( j, i )tocost
vector C ( j ) resulting in a single quantity to represent the importance of
avoiding a particular type of error. Accurately performing this conversion is
non-trivial since it depends both on the frequency of examples of each class
as well as the frequency that an example of one class might be mistaken for
another. There are different ways to overcome the diculties of estimating
the class probability. Bayesian classifiers estimate the probability that an
example belongs to each class. Similarly, a decision tree can also produce
an estimate of the probability that an example belongs to each class given
that it is classified by a particular leaf.
N i +1
p ( y = i )=
(12.6)
j
k
k +
N j
The above equations are a few of the existing main methods for dealing with
cost. In general these methods can be divided into three main categories:
The first category is a group of methods that replace the accuracy-
based criterion (i.e., information gain) with another cost-based or with
a combination of accuracy and cost criterion. However, methods in this
category sacrifice accuracy for cost. It also requires the user to select a
particular classifier with characteristics that may not suite his domain.
The second group is stratifying methods [Provost and Fawcett (1998)],
which change the frequency of classes in the training data in proportion
to their cost. However, this approach has several shortcomings. It distorts
the distribution of examples, which may seriously affect the performance of
some algorithms. It reduces the data available for learning if stratifying is
carried out by under-sampling. It increases the learning time if it is done by
over-sampling. The third category is a group of wrapping methods, which
converts error-based classifiers into cost-sensitive ones. These methods
minimize cost while seeking to minimize zero-one loss. It also assumes
that the user has chosen a particular classifier because its characteristics
are well suited to the domain. The vast majority of the methods (of all
three categories) fail to incorporate the widely available domain knowledge
in their algorithms, which leaves many of them in the theoretical realm.
The few that do consider domain knowledge and deals with cost, take into
considerations only the misclassification cost.
The MetaCost algorithm [ Domingos (1999) ] is a wrapper method that
can be used to make any inducer, in particular decision tree inducers, to be
cost-sensitive. Specifically, MetaCost forms multiple bootstrap replicates of
Search WWH ::




Custom Search