Java Reference
In-Depth Information
new cases to clusters. Even association rules can be used for prediction
2001, Antonie/Zaiane 2002], although this is not
covered by JDM 1.1. See Chapter 18 on JDM 2.0 features for a brief
1998, Li
Classification can be used to make predictions involving a wide
range of problems, including campaign response, customer segmen-
tation, churn and attrition modeling, credit analysis, and patient out-
comes. Many of these were discussed in Chapter 2.
The notion of classification is to classify cases according to a fixed set
of categories . A category, or class , is simply a discrete value with some
well-defined meaning. For example, the problem to determine “will
the customer respond to a campaign” typically has two categories: yes
and no . As a supervised mining function, the data used to build a
classification model needs to include a target attribute , containing the
known outcomes. From the response modeling example in Chapter
2, “yes” in the target attribute of build or test data indicates that the
customer responded to the campaign; “no” indicates that the customer
did not respond.
In the apply results, where the predictions are placed, “yes”
indicates that the model predicted the customer will respond to a
similar future campaign. This prediction is often accompanied by a
probability; for example, the customer is 80 percent likely to
respond to this campaign. This can be taken to mean that out of 100
customers with similar probability, 80 of those customers should
respond to the campaign. Probabilities are quite useful since they
can be used to compute, for example, expected return on a campaign
by multiplying the probability by the expected order value of each
customer and summing the results. Problems like response modeling
normally have two possible outcomes, a binary classification problem ,
but classification can also be used to predict more than two possible
outcomes, a multiclass classification problem .
Let's take a look at the data used to build a classification model.
As a type of supervised learning, classification algorithms build a
model from a set of predictors used to predict a target . As illustrated in
Figure 4-1 for the response modeling problem, a set of predictors
may include demographic data such as age, income, number of
children, and head of household (yes or no), to predict a customer's
response to a campaign. The data for classification contains attribute
Search WWH ::

Custom Search