Java Reference

In-Depth Information

node size to avoid tree nodes with low support,
maximum confidence

to avoid pure nodes, and
minimum decrease
in impurity
to avoid node

splits that gain only minimal increase in predictive accuracy of the

prediction. Users can specify one or more of these stopping criteria,

and the tree will grow until the first stopping criteria is met.

Pruning is the process of removing the less significant tree nodes,

for example, those with insufficient support. There are two types of

pruning:
pre-pruning
and
post-pruning
. Pre-pruning avoids insignificant

node splits while building the tree by measuring the goodness of the

split. Post-pruning removes the insignificant nodes after building a

fully grown
tree. Different measures called tree homogeneity metrics

are used to define the goodness of a node split, such as
gini, entropy,

mean absolute deviation, mean square error,
and
misclassification ratio
.
Tree

homogeneity metrics
are also known as
information gain
. Refer to [Han/

Kamber 2006] for more details about the tree homogeneity metrics.

Naïve Bayes

The naïve bayes algorithm is one of the fastest classification algo-

rithms. It produces results comparable to other algorithms, often out-

performing other classification algorithms. Naïve bayes works well

with large volumes of data.

Overview

Naive bayes is based on
Bayes Theorem
[Han/Kamber 2006] and

assumes that the predictor attributes are
conditionally independent
2

[Wikipedia-CI 2006] of each other with respect to the target attribute.

This assumption significantly reduces the number of computations

required to predict a target value and hence the
naïve bayes
algorithm

performs well with large volumes of data.

The naïve bayes algorithm involves computing the probability of

each target and predictor attribute value combination. To control the

number of such combinations, attributes that have either continuous

values or a high number of distinct values are typically binned. Refer

to Section 3.2 for more detailed discussion on binning. In this

example, to simplify the description of the naïve bayes algorithm,

consider two attributes
age
and
savings balance
from the CUSTOMERS

(Table 7-3) dataset. These attributes are binned to have two binned

2

Two events
A
and
B
are conditionally independent given a third event
C

precisely if the occurrence or non-occurrence of
A
and
B
are independent events

in their conditional probability distribution given
C
. In other words,

Pr (
A

B

C
)

Pr(
A

C
) Pr(
B

C
).

Search WWH ::

Custom Search