Database Reference
In-Depth Information
10 tree depth, AUC = 76.26%
20 tree depth, AUC = 98.45%
Next, we will perform the same computation using the Gini impurity measure (we omit-
ted the code as it is very similar, but it can be found in the code bundle). Your results
should look something like this:
1 tree depth, AUC = 59.33%
2 tree depth, AUC = 61.68%
3 tree depth, AUC = 62.61%
4 tree depth, AUC = 63.63%
5 tree depth, AUC = 64.89%
10 tree depth, AUC = 78.37%
20 tree depth, AUC = 98.87%
As you can see from the preceding results, increasing the tree depth parameter results in a
more accurate model (as expected since the model is allowed to get more complex with
greater tree depth). It is very likely that at higher tree depths, the model will over-fit the
dataset significantly.
There is very little difference in performance between the two impurity measures.
The naïve Bayes model
Finally, let's see the impact of changing the lambda parameter for naïve Bayes. This
parameter controls additive smoothing, which handles the case when a class and feature
value do not occur together in the dataset.
Tip
See http://en.wikipedia.org/wiki/Additive_smoothing for more details on additive smooth-
ing.
We will take the same approach as we did earlier, first creating a convenience training
function and training the model with varying levels of lambda :
def trainNBWithParams(input: RDD[LabeledPoint], lambda:
Double) = {
val nb = new NaiveBayes
nb.setLambda(lambda)
Search WWH ::




Custom Search