Java Reference

In-Depth Information

U.S. residents must have the state value as a two-letter abbreviation

of one of the 50 states or the District of Columbia. To indicate valid

attribute values to the model build, a
category set
can be specified in

the logical data specification. The
category set
characterizes the values

found in a categorical attribute. In this example, the category set for

the state attribute contains values {AL, AK, AS, AZ, ..., WY}. The

state values that are not in this
set
will be considered as invalid

values during the model build, and may be treated as missing or

terminate execution.

Our CUSTOMERS dataset has a disproportionate number of

Non-attriters
: 20 percent of the cases are
Attriters
, and 80 percent are

Non-attriters
. To build an unbiased model, the data miner balances

the input dataset to contain an equal number of cases with each tar-

get value using stratified sampling. In JDM,
prior probabilities
are

used to represent the original distribution of attribute values. The

prior probabilities should be specified when the original target

value distribution is changed, so that the algorithm can consider

them appropriately. However, not all algorithms support prior

probability specification, so you will need to consult a given tool's

documentation.

ABCBank management informed the data miner that it is more

expensive when an
attriter
is misclassified, that is, predicted as a

Non-attriter
. This is because losing an existing customer and acquiring

a new customer costs much more than trying to retain an existing

customer. For this, JDM allows the specification of a
cost matrix
to

specify
costs
associated with possible
false predictions
. A cost matrix

is an N

N table that defines the cost associated with incorrect

predictions, where N is the number of possible target values. In this

example, the data miner specifies a cost matrix indicating that

predicting a customer would not attrite when in fact he would is

three times costlier than predicting the customer would attrite

when he actually would not. The cost matrix for this problem is

illustrated in Figure 7-2.

Predicted

Attriter

Non-attriter

Attriter

0 (TP)

$50 (FP)

$150 (FN)

Actual

Non-attriter

0 (TN)

Figure 7-2

Cost matrix table.

Search WWH ::

Custom Search