node size to avoid tree nodes with low support, maximum confidence
to avoid pure nodes, and minimum decrease in impurity to avoid node
splits that gain only minimal increase in predictive accuracy of the
prediction. Users can specify one or more of these stopping criteria,
and the tree will grow until the first stopping criteria is met.
Pruning is the process of removing the less significant tree nodes,
for example, those with insufficient support. There are two types of
pruning: pre-pruning and post-pruning . Pre-pruning avoids insignificant
node splits while building the tree by measuring the goodness of the
split. Post-pruning removes the insignificant nodes after building a
fully grown tree. Different measures called tree homogeneity metrics
are used to define the goodness of a node split, such as gini, entropy,
mean absolute deviation, mean square error, and misclassification ratio . Tree
homogeneity metrics are also known as information gain . Refer to [Han/
Kamber 2006] for more details about the tree homogeneity metrics.
The naïve bayes algorithm is one of the fastest classification algo-
rithms. It produces results comparable to other algorithms, often out-
performing other classification algorithms. Naïve bayes works well
with large volumes of data.
Naive bayes is based on Bayes Theorem [Han/Kamber 2006] and
assumes that the predictor attributes are conditionally independent 2
[Wikipedia-CI 2006] of each other with respect to the target attribute.
This assumption significantly reduces the number of computations
required to predict a target value and hence the naïve bayes algorithm
performs well with large volumes of data.
The naïve bayes algorithm involves computing the probability of
each target and predictor attribute value combination. To control the
number of such combinations, attributes that have either continuous
values or a high number of distinct values are typically binned. Refer
to Section 3.2 for more detailed discussion on binning. In this
example, to simplify the description of the naïve bayes algorithm,
consider two attributes age and savings balance from the CUSTOMERS
(Table 7-3) dataset. These attributes are binned to have two binned
Two events A and B are conditionally independent given a third event C
precisely if the occurrence or non-occurrence of A and B are independent events
in their conditional probability distribution given C . In other words,
Pr ( A
C ) Pr( B