Java Reference

In-Depth Information

values to further simplify this discussion. For
age
, bin-1 contains values

less than or equal to 35 and bin-2 contains the values greater than 35.

For
savings balance
, bin-1 contains values less than or equal to $20,000

and bin-2 contains values greater than $20,000. In JDM, a naïve bayes

algorithm computes the probabilities of a target value for a given

attribute value using the cases in the build dataset. In this example,

we have two attributes with two binned values for a binary target.

Listing 7-1 shows the list of eight possible probabilities that are

computed as part of the naïve bayes model build. Using these proba-

bility values, the naïve bayes algorithm computes the most probable

target value for a given new case. In this example, for a new customer

whose age

$13,300, the probability of being

an
Attriter
and
Non-Attriter
is computed as shown in Listing 7-2. Note

that in Listing 7-2
P
(
Attriter
) and
P
(
Non-Attriter
) are
prior-probabilities
of

the target values that are specified as input to the model build. For

this new customer case, the probability of being a
Non-attriter
(
0.31
) is

more than that of an
Attriter
(
0.03
) and hence the model predicts this

customer as a
Non-attriter
. For a more detailed discussion on naïve

bayes and bayesian classification refer to [Han/Kamber 2006].

25 and savings balance

Algorithm Settings

In JDM, a naïve bayes algorithm has two settings,
singleton threshold
,

and
pairwise threshold
, that are used to define which predictor

attribute values or predictor-target value pairs should be ignored.

Listing 7-1

Naïve bayes algorithm computation of probabilities using build data

Probability of age < 35 when the customer is Attriter

P( age < 35 / Attriter )
2/6
0.33

Probability of age < 35 when the customer is Non-attriter

P( age < 35 / Non-attriter )
4/6
0.64

Probability of age > 35 when the customer is Attriter

P( age > 35 / Attriter )
3/4
0.75

Probability of age > 35 when the customer is Non-attriter

P( age > 35 / Non-attriter )
1/4
0.25

Probability of savings balance (SB) < 20000 when the customer is Attriter

P( SB < 20000 / Attriter )
3/7
0.43

Probability of savings balance (SB) < 20000 when the customer is Non-attriter

P( SB < 20000 / Non-attriter )
4/7
0.57

Probability of savings balance (SB) > 20000 when the customer is Attriter

P( SB > 20000 / Attriter )
3/3
1.00

Probability of savings balance (SB) > 20000 when the customer is Non-attriter

P( SB > 20000 / Non-attriter )
0/3
0.00

Search WWH ::

Custom Search