Database Reference
In-Depth Information
regResults.foreach { case (param, auc) => println(f"$param,
AUC = ${auc * 100}%2.2f%%") }
Your output should look like this:
0.001 L2 regularization parameter, AUC = 66.55%
0.01 L2 regularization parameter, AUC = 66.55%
0.1 L2 regularization parameter, AUC = 66.63%
1.0 L2 regularization parameter, AUC = 66.04%
10.0 L2 regularization parameter, AUC = 35.33%
As we can see, at low levels of regularization, there is not much impact in model perform-
ance. However, as we increase regularization, we can see the impact of under-fitting on
our model evaluation.
Tip
You will find similar results when using the L1 regularization. Give it a try by performing
the same evaluation of regularization parameter against the AUC measure for L1Up-
dater .
Decision trees
The decision tree model we trained earlier was the best performer on the raw data that we
first used. We set a parameter called maxDepth , which controls the maximum depth of
the tree and, thus, the complexity of the model. Deeper trees result in more complex mod-
els that will be able to fit the data better.
For classification problems, we can also select between two measures of impurity: Gini
and Entropy .
Tuning tree depth and impurity
We will illustrate the impact of tree depth in a similar manner as we did for our logistic re-
gression model.
First, we will need to create another helper function in the Spark shell:
import org.apache.spark.mllib.tree.impurity.Impurity
import org.apache.spark.mllib.tree.impurity.Entropy
import org.apache.spark.mllib.tree.impurity.Gini
Search WWH ::




Custom Search