Database Reference
In-Depth Information
being positive is estimated by the relative cost of CP and CN ; the smaller
the cost, the larger the probability (as minimum cost is sought). Thus, the
probability of being a positive leaf is: PP =1
CP
CP + CN
and the expected
misclassification cost of being positive is: EP = PP
CP . Similarly, the
expected misclassification cost of being negative is: EN =(1
·
CN .
Therefore, without splitting, the expected total misclassification cost of a
given set of examples is: E = EP + EN . Thus, we calculate the expected
total misclassification cost before ( E )andafter( E i ) performing a certain
split using attribute a i . The total expected cost reduction is E
PP )
·
TC i ,
where TC i is the cost of testing examples on attribute A i . Note that the
above algorithm can be easily adjusted to the case in which the cost of
obtaining values of a group of attributes is lower than obtaining the value
of each attribute independently.
E i
12.5 Active Learning
When marketing a service or a product, firms increasingly use predictive
models to estimate the customer interest in their offer. A predictive model
estimates the response probability of potential customers and helps the
decision maker assess the profitability of the various customers. Predictive
models assist in formulating a target marketing strategy: offering the
right product to the right customer at the right time using the proper
distribution channel. The firm can subsequently approach those customers
estimated to be the most interested in the company's product and propose a
marketing offer. A customer that accepts the offer and conducts a purchase
increases the firms profits. This strategy is more ecient than a mass
marketing strategy, in which a firm offers a product to all known potential
customers, usually resulting in low positive response rates. For example, a
mail marketing response rate of 2% predictive models can be built using
data mining methods. These methods are applied to detect useful patterns
in the information available about the customer's purchasing behaviors.
Data for the models is available, as firms typically maintain databases that
contain massive amounts of information about their existing and potential
customer's such as the customer's demographic characteristics and past
purchase history.
Active learning refers to data mining policies which actively select
unlabeled instances for labeling. Active learning has been previously used
for facilitating direct marketing campaigns. In such campaigns there is
an exploration phase in which several potential customers are approached
Search WWH ::




Custom Search