Java Data Mining Concepts - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

1.0

0.9

Predicted as

Non-attriter

0.8

0.7

0.6

0.5

0.4

0.3

0.2

Predicted as

Attriter

0.1

0.0

Customer Case

(a) Probability threshold

False Positive Rate

(b) ROC curves

Model A

Model B

Random

Figure 7-7

Receiver operating characteristics.

In the ROC graph, the point (0,1) is the perfect classifier 4 : it classifies

all positive cases and negative cases correctly. It is (0,1) because the

false positive rate is 0 (none), and the true positive rate is 1 (all). The

point (0,0) represents a classifier that predicts all cases to be negative,

while the point (1,1) corresponds to a classifier that predicts every

case to be positive. Point (1,0) is the classifier that is incorrect for all

classifications.

Lift and cumulative gain are also popular metrics to assess the

effectiveness of a classification model. Lift is the ratio between the

results obtained using the classification model and a random selec-

tion. Cumulative gain is the percentage of positive responses deter-

mined by the model across quantiles of the data. Cases are typically

divided into 10 or 100 quantiles against which the lift and cumula-

tive gain is reported, as illustrated later in Table 7.5. The lift chart

and cumulative gains charts are often used as visual aids for assess-

ing model performance. An understanding of how cumulative lift

and cumulative gains are computed helps in understanding the

cumulative lift and cumulative gains charts illustrated in Figure 7-8.

4

A classification model is also referred to as a classifier since it classifies cases

among the possible target values.

Java Data Mining: Strategy, Standard, and Practice

Search WWH ::

Custom Search

Home