Practical Problem Solving - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

To achieve code compactness, we made two assumptions about

the JDM implementation:

• Its ability to build classification models on unprepared data:

If the JDM implementation does not work directly on unpre-

pared data, the code presented above must be augmented

with data preparation code. Because data preparation is

algorithm-specific, the Java programmer must understand

how to prepare data for the specific algorithm.

• Its ability to build classification models when the ratio of

positive cases is small (such as 2 percent): If a JDM

implementation does not offer an algorithm giving good

results when the number of positive cases is low, the code

presented earlier must be modified before building models.

Specifically, the build dataset must be produced with all

possible positive cases and a subsample of the negative

cases—a stratified sample allowing a more balanced (50% to

50%) proportion of positive versus negative cases in the

build dataset. When such stratified sampling is used, the

JDM feature for specifying prior probabilities can be used in

the ClassificationBuildSettings .

12.2

Business Scenario 2: Understanding Key Factors

HEW has used the CampaignOptimizer code to compare several JDM

implementations in one experimental campaign, and was particu-

larly impressed with the results, both in terms of ease of use and per-

formance, but the campaign manager wanted more information

about how the models were making their predictions. As an example,

he would have loved to see which attributes were most used by the

DME to compute the probability that a prospective customer will

respond to that mailing.

12.2.1

Code Example

With this new objective, the user has decided to complement the

CampaignOptimizer object with a feature to indicate key factors. This

is a small enhancement to the previous project, which does not

require any new design considerations.

In JDM, the notion of key factors can be obtained through attribute

importance. There are two ways to obtain a list of the important

Search WWH ::

Custom Search

Home