Java Reference
In-Depth Information
U.S. residents must have the state value as a two-letter abbreviation
of one of the 50 states or the District of Columbia. To indicate valid
attribute values to the model build, a category set can be specified in
the logical data specification. The category set characterizes the values
found in a categorical attribute. In this example, the category set for
the state attribute contains values {AL, AK, AS, AZ, ..., WY}. The
state values that are not in this set will be considered as invalid
values during the model build, and may be treated as missing or
terminate execution.
Our CUSTOMERS dataset has a disproportionate number of
Non-attriters : 20 percent of the cases are Attriters , and 80 percent are
Non-attriters . To build an unbiased model, the data miner balances
the input dataset to contain an equal number of cases with each tar-
get value using stratified sampling. In JDM, prior probabilities are
used to represent the original distribution of attribute values. The
prior probabilities should be specified when the original target
value distribution is changed, so that the algorithm can consider
them appropriately. However, not all algorithms support prior
probability specification, so you will need to consult a given tool's
documentation.
ABCBank management informed the data miner that it is more
expensive when an attriter is misclassified, that is, predicted as a
Non-attriter . This is because losing an existing customer and acquiring
a new customer costs much more than trying to retain an existing
customer. For this, JDM allows the specification of a cost matrix to
specify costs associated with possible false predictions . A cost matrix
is an N
N table that defines the cost associated with incorrect
predictions, where N is the number of possible target values. In this
example, the data miner specifies a cost matrix indicating that
predicting a customer would not attrite when in fact he would is
three times costlier than predicting the customer would attrite
when he actually would not. The cost matrix for this problem is
illustrated in Figure 7-2.
Predicted
Attriter
Non-attriter
Attriter
0 (TP)
$50 (FP)
$150 (FN)
Actual
Non-attriter
0 (TN)
Figure 7-2
Cost matrix table.
Search WWH ::




Custom Search