To achieve code compactness, we made two assumptions about
the JDM implementation:
• Its ability to build classification models on unprepared data:
If the JDM implementation does not work directly on unpre-
pared data, the code presented above must be augmented
with data preparation code. Because data preparation is
algorithm-specific, the Java programmer must understand
how to prepare data for the specific algorithm.
• Its ability to build classification models when the ratio of
positive cases is small (such as 2 percent): If a JDM
implementation does not offer an algorithm giving good
results when the number of positive cases is low, the code
presented earlier must be modified before building models.
Specifically, the build dataset must be produced with all
possible positive cases and a subsample of the negative
cases—a stratified sample allowing a more balanced (50% to
50%) proportion of positive versus negative cases in the
build dataset. When such stratified sampling is used, the
JDM feature for specifying prior probabilities can be used in
the ClassificationBuildSettings .
Business Scenario 2: Understanding Key Factors
HEW has used the CampaignOptimizer code to compare several JDM
implementations in one experimental campaign, and was particu-
larly impressed with the results, both in terms of ease of use and per-
formance, but the campaign manager wanted more information
about how the models were making their predictions. As an example,
he would have loved to see which attributes were most used by the
DME to compute the probability that a prospective customer will
respond to that mailing.
With this new objective, the user has decided to complement the
CampaignOptimizer object with a feature to indicate key factors. This
is a small enhancement to the previous project, which does not
require any new design considerations.
In JDM, the notion of key factors can be obtained through attribute
importance. There are two ways to obtain a list of the important