Database Reference
In-Depth Information
So, we can see that the number of iterations has minor impact on the results once a certain
number of iterations have been completed.
Step size
In SGD, the step size parameter controls how far in the direction of the steepest gradient
the algorithm takes a step when updating the model weight vector after each training ex-
ample. A larger step size might speed up convergence, but a step size that is too large
might cause problems with convergence as good solutions are overshot.
We can see the impact of changing the step size here:
val stepResults = Seq(0.001, 0.01, 0.1, 1.0, 10.0).map {
param =>
val model = trainWithParams(scaledDataCats, 0.0,
numIterations, new SimpleUpdater, param)
createMetrics(s"$param step size", scaledDataCats, model)
}
stepResults.foreach { case (param, auc) =>
println(f"$param, AUC = ${auc * 100}%2.2f%%") }
This will give us the following results, which show that increasing the step size too much
can begin to negatively impact performance.
0.001 step size, AUC = 64.95%
0.01 step size, AUC = 65.00%
0.1 step size, AUC = 65.52%
1.0 step size, AUC = 66.55%
10.0 step size, AUC = 61.92%
Regularization
We briefly touched on the Updater class in the preceding logistic regression code. An
Updater class in MLlib implements regularization. Regularization can help avoid over-
fitting of a model to training data by effectively penalizing model complexity. This can be
done by adding a term to the loss function that acts to increase the loss as a function of the
model weight vector.
Regularization is almost always required in real use cases, but is of particular importance
when the feature dimension is very high (that is, the effective number of variable weights
that can be learned is high) relative to the number of training examples.
Search WWH ::




Custom Search