Solving Problems in Industry - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

Where there are discrepancies (i.e., the case was believed to be

nonfraudulent, but predicted to be fraudulent), there are opportu-

nities for investigation.

In fraud detection, we want to ensure we catch fraud (minimize

false negatives, which incorrectly identify fraud events as nonfraud

events), while avoiding investigating too many red herrings (mini-

mize false positives, which identify nonfraud events as fraud events),

since the costs associated with investigating fraud can be high.

We digress briefly to discuss the types of errors possible for a

classification model. Figure 2-4 illustrates a typical report on predic-

tion accuracy, where Type I error is considered a false negative

prediction and Type II error is considered a false positive prediction.

The columns are labeled with the possible predicted classes, in this

binary case, “0” corresponds to the negative (nonfraud) prediction,

and “1” the positive (fraud) prediction. The value reported where

actual and predicted equals “1” indicates the number of times the

positive class was predicted correctly. Similarly, the value reported

where the actual and predicted equals “0” indicate the number of

times the negative class was predicted correctly. More than two val-

ues are possible when predicting multiple outcomes. In this case, the

matrix is n

n , instead of 2

2, where n is the number of possible

values.

Type I

Error

Type II

Error

Accuracy = Total Correct / Total Scored

= (523 + 143) / 834

= 666 / 834

= 0.7985

Figure 2-4

Assessing prediction accuracy via Type I and Type II error.

Source: Screen

capture from Oracle Data Miner graphical interface.

Search WWH ::

Custom Search

Home