Java Reference
In-Depth Information
values to further simplify this discussion. For
age
, bin-1 contains values
less than or equal to 35 and bin-2 contains the values greater than 35.
For
savings balance
, bin-1 contains values less than or equal to $20,000
and bin-2 contains values greater than $20,000. In JDM, a naïve bayes
algorithm computes the probabilities of a target value for a given
attribute value using the cases in the build dataset. In this example,
we have two attributes with two binned values for a binary target.
Listing 7-1 shows the list of eight possible probabilities that are
computed as part of the naïve bayes model build. Using these proba-
bility values, the naïve bayes algorithm computes the most probable
target value for a given new case. In this example, for a new customer
whose age
$13,300, the probability of being
an
Attriter
and
Non-Attriter
is computed as shown in Listing 7-2. Note
that in Listing 7-2
P
(
Attriter
) and
P
(
Non-Attriter
) are
prior-probabilities
of
the target values that are specified as input to the model build. For
this new customer case, the probability of being a
Non-attriter
(
0.31
) is
more than that of an
Attriter
(
0.03
) and hence the model predicts this
customer as a
Non-attriter
. For a more detailed discussion on naïve
bayes and bayesian classification refer to [Han/Kamber 2006].
25 and savings balance
Algorithm Settings
In JDM, a naïve bayes algorithm has two settings,
singleton threshold
,
and
pairwise threshold
, that are used to define which predictor
attribute values or predictor-target value pairs should be ignored.
Listing 7-1
Naïve bayes algorithm computation of probabilities using build data
Probability of age < 35 when the customer is Attriter
P( age < 35 / Attriter )
2/6
0.33
Probability of age < 35 when the customer is Non-attriter
P( age < 35 / Non-attriter )
4/6
0.64
Probability of age > 35 when the customer is Attriter
P( age > 35 / Attriter )
3/4
0.75
Probability of age > 35 when the customer is Non-attriter
P( age > 35 / Non-attriter )
1/4
0.25
Probability of savings balance (SB) < 20000 when the customer is Attriter
P( SB < 20000 / Attriter )
3/7
0.43
Probability of savings balance (SB) < 20000 when the customer is Non-attriter
P( SB < 20000 / Non-attriter )
4/7
0.57
Probability of savings balance (SB) > 20000 when the customer is Attriter
P( SB > 20000 / Attriter )
3/3
1.00
Probability of savings balance (SB) > 20000 when the customer is Non-attriter
P( SB > 20000 / Non-attriter )
0/3
0.00
Search WWH ::
Custom Search