Cost-sensitive Active and Proactive Learning of Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

secondary. Prioritization of these criteria agrees with the assumption that

the exploitation phase is longer than the exploration phase.

Assuming that we use a decision tree as the classifier, we are able to

estimate the probability p i by locating the appropriate leaf k in the tree

that refers to the current instance x i . The frequency vector of each leaf node

captures the number of instances from each possible class. In the usual case

of target marketing, the frequency vector has the form: ( m k,accept ,m k,reject )

where m k,c denotes the number of instances in the labeled pool that reach

leaf k and satisfy y = c . According to Laplace's law of succession, the

probability p i is estimated as:

m k,accept +1

m k,accept + m k,reject +2 .

p i = p ( m k,accept ,m k,reject )=

(12.10)

Besides estimating the point probability p i , we are interested in

estimating a confidence interval for this probability. An approach to a

customer can be considered as a Bernoulli trial. For the sake of simplicity,

we approximate the confidence interval of the Bernoulli parameter with the

normal approximation to the binomial distribution:

p i −

z 1 −α/ 2 σ i <p i < p i + z 1 −α/ 2 σ i

p i )

m k,accept + m k,reject

p i (1

−

σ i = σ ( m k,accept ,m k,reject )=

(12.11)

where σ i represents the estimated standard deviation and z 1 −α/ 2 denotes

the value in the standard normal distribution table corresponding to the

1

α/ 2 percentile. For a small n we can use the actual binomial distribution

to estimate the interval.

To demonstrate the importance of a confidence level, consider two

leaves: leaf A and leaf B in a classification tree. Each leaf holds the

customers in the labeled pool that fits its path. These customers are labeled

as either “accept” or “reject”. If the “accept”/“reject” proportions are the

same, then according to Eq. (12.10), both leaves have the same estimated

probability. Given this, if leaf A has more customers than leaf B ,then

according to Eq. (12.11), leaf B has a larger confidence interval. Thus,

acquiring an instance to leaf B will have a greater impact on the class

distribution than adding an example to leaf A . In the initial iterations,

when the data are limited and the confidence intervals are large, obtaining

an additional instance to the correct leaf is especially important. Moreover,

the potential contribution of labeling the i th instance in the same leaf and

−

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home