a nonresponder. It is from this trial, or sample, of the population that
we can also test model accuracy and obtain a lift chart.
Response modeling can be combined with a value prediction,
such as dollar amount of order, donation size, etc., to derive an
expected return on the campaign. A regression model can be built to
predict, for example, the amount each customer spends or each
alumnus donates. Multiplying this value by the probability that a
given customer will respond to the campaign produces an expected
value for that customer. Customers can be sorted not only by likeli-
hood to respond, but by expected value to identify the highest likely
spenders or donors.
Another refinement of response modeling is to determine which
channel is best to approach these customers, for example, mail, e-mail,
or phone. Once again, based on historical data, we can learn the pat-
tern of customers who respond best to mail, e-mail, or phone.
Anywhere money is involved, the potential for fraud exists; all
industries are vulnerable to individuals who abuse established
procedures for personal gain, often illegally. Healthcare, financial
services, and taxation are just a few areas where fraud is found.
One approach to fraud detection involves clustering. The objective
is first to group the data into clusters. We can then review each of the
clusters to see if there is a concentration of known fraud in any one
cluster, indicating that fraud is more likely to occur within a given
cluster than another. In addition, we can look for cases that don't
match any of the known clusters particularly well, or at all. These
outliers become prime candidates for investigation.
A second approach to fraud detection involves classification.
We first identify examples of fraud manually in historical data.
With classification, the goal is to learn to distinguish between
fraudulent and nonfraudulent behavior. Consider a dataset con-
sisting of various predictor attributes, such as “age,” “income,”
“wire transfer within last 10 days,” and a target attribute indicat-
ing if the case was fraudulent or not. A classification algorithm like
decision tree or support vector machine can then predict the likeli-
hood of fraud on new data. Cases with a high probability of fraud
are then good candidates for investigation. However, we can also
predict the likelihood of fraud on the original data. This allows for
a comparison between actual target values and the predicted values.