Java Reference
In-Depth Information
highest probability. The first row of Figure 8-8(a) depicts this case.
However, if we use the cost matrix in Figure 8-8(b), the
cost of pre-
dicting a Non-attriter value
is computed as the cost the business
incurs when the actual value is
Attriter
and vice versa. If this cost
matrix is applied for the same customer case, the cost of predicting
Attriter
is $30 and
Non-attriter
is $45 as shown in the second row of
Figure 8-8(a). Since we are choosing the lowest cost prediction, the
model predicts this same customer as an
Attriter
($30 < $45).
Figure 8-9 shows the apply content options for each function that
supports apply using the JDM enumerations
ClassificationApplyCon-
tent, RegressionApplyContent,
and
ClusteringApplyContent
.
For classification models, JDM defines four possible contents—
predicted category, probability, cost,
and
node ID
. The
predicted category
results in the predicted target value in the apply output, similarly
probability
and
cost
contents result in the probability or cost corre-
sponding to the predicted target value. The
node id
content is specific
to rules-based models such as decision tree that use a specific tree
node or rule for making a prediction. When node id content is speci-
fied, the node id that produced the prediction is provided in the
apply result. Node id is useful to show why a given prediction was
made.
Top
Prediction
Non-
Attriter
Predicted
Attriter
Non-Attriter
Attriter
Probability
0.30
0.7 $50
= $30
0.70
Non-Attriter
Attriter
$150 (FN)
0 (TP)
Actual
0.3 $150
= $45
Cost
Attriter
Non-Attriter
$50 (FP)
0 (TN)
(a)
(b)
Figure 8-8
Prediction Costs. (a) Computation of costs based on the (b) specified
cost matrix.
javax.datamining
Enum
ClasssificationApplyContent
predictedCategory
probability
cost
nodeld
RegressipnApplyContent
predictedValue
confidence
ClasssificationApplyContent
clusterIdentifier
probability
qualityofFit
distance
Figure 8-9
Apply contents.
Search WWH ::
Custom Search