Getting Started - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

of the likely responders from only 40 percent of the data. The

DMWhizz data miner may decide to specify other algorithms besides

the decision tree that the DME selected, change some of the decision

tree algorithm settings, or prepare the data differently to see if a better

lift can be achieved. If another model produced a higher lift for 40 per-

cent of the customers, perhaps 0.7, the data miner would likely choose

that model. If another model produced the lift of 0.6 at 35 percent of

the customers, DMWhizz may choose to send the campaign to fewer

customers while maintaining the same number of likely responses.

We now move on to the evaluation phase of the process.

6.5

Evaluation

In the evaluation phase, we are interested in understanding how well

the model meets the business objectives. We see from the test results of

the model produced above that the lift is .6. How does this meet our

business objectives? Out of 1 million customers, historically we know

that 3 percent, or 30,000, should respond. The lift results tells us that by

contacting the right 400,000 customers (or 40 percent of the 1 million

customer base), DMWhizz can get 60 percent of the likely responders,

or 18,000 (60 percent of 30,000). This is a response rate of 4.5 percent.

Given that DMWhizz's original requirement was to increase the

response rate to 4 percent, the expected 4.5 percent provided by data

mining yields a comfortable margin. As a result, DMWhizz decides

to use this model to score the remaining 980,000 customers and pro-

ceed with the campaign.

6.6

Deployment

In the evaluation phase, DMWhizz found that the model meets the

objective and can be used to complete the campaign. Since we have

already contacted 20,000 of the 1 million customers for our sample

campaign, we apply the model to the remaining 980,000 customer

cases and send a mailing to the top 40 percent of customers predicted

to respond to the campaign. We use the code below to score these

customers in batch, that is, all at once, and produce a separate table

that includes the customer identifier and the probability that the

customer will respond. All the cases are ordered by their probability

in descending order.

To apply the model, we first create a physical dataset object from

the CUSTOMER_APPLY table. The CUSTOMER_APPLY table contains

Search WWH ::

Custom Search

Home