Cost-sensitive Active and Proactive Learning of Decision Trees - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

associated costs are infinite. If for some reason the cost of an action depends

on attributes that are not included in the set of explaining attributes, we

include these attributes in D , and call them silent attributes — attributes

that are not used by the supervised learning algorithms, but are included

in the domain of the proactive data mining task.

The benefit function B : D

R assigns a real value benefit (or

outcome) that represents the company's benefit from any possible record.

The benefit from a specific record depends not only on the value of the

target attribute, but also on the values of the explaining attributes. For

example, benefit from a loyal client depends not only on the target value of

churning = 0, but also on the explaining attributes of the client, such as his

or her revenue. As in the case of the attribute changing cost function, the

domain D may include silent attributes. In the following section we combine

the benefit and the attribute changing functions and formally define the

objective of the proactive data mining task.

×

D ( T )

→

12.6.3

Maximizing Utility

The objective in proactive data mining is to find the optimal decision

making policy . A policy is a mapping O : D

D that defines the impact

of some actions on the values of the explaining attributes. In order for a

policy to be optimal, it should maximize the expected value of a utility

function. The utility function that we consider in this topic results from

the benefit and attribute changing cost functions in the following manner:

the addition to the benefit due to the move minus the attribute changing

cost that is associated with that move.

It should be noted that the stated objective is to find an optimal

policy. The optimal policy may depend on the probability distribution of

the explaining attributes which is considered unknown. We use the training

set as the empirical distribution, and search for the optimal actions with

regard to that dataset. That is, we search for the policy that, if followed,

will maximize the sum of the utilities that are gained from the N training

observations.

It should also be noted that the cost, which is associated to O ,

can be calculated directly from the function C .Thecostofa move —

that is, changing the values of the explaining attributes from x i = <

x 1 ,i ,x 2 ,i ,...,x k,i > to x j = <x 1 ,j ,x 2 ,j ,...,x k,j > is simply C ( x i ,x j ).

However, in order to evaluate the benefit that is associated with the move,

we must also know the impact of the change on the target attribute. This

→

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home