Database Reference
In-Depth Information
When regularization is absent or low, models can tend to over-fit. Without regularization,
most models will over-fit on a training dataset. This is a key reason behind the use of
cross-validation techniques for model fitting (which we will cover now).
Conversely, since applying regularization encourages simpler models, model performance
can suffer when regularization is high through under-fitting the data.
The forms of regularization available in MLlib are:
SimpleUpdater : This equates to no regularization and is the default for logist-
ic regression
SquaredL2Updater : This implements a regularizer based on the squared
L2-norm of the weight vector; this is the default for SVM models
L1Updater : This applies a regularizer based on the L1-norm of the weight vec-
tor; this can lead to sparse solutions in the weight vector (as less important
weights are pulled towards zero)
Note
Regularization and its relation to optimization is a broad and heavily researched area.
Some more information is available from the following links:
• General regularization overview: http://en.wikipedia.org/wiki/Regulariza-
tion_(mathematics)
• L2 regularization: http://en.wikipedia.org/wiki/Tikhonov_regularization
• Over-fitting and under-fitting: http://en.wikipedia.org/wiki/Overfitting
• Detailed overview of over-fitting and L1 versus L2 regularization: ht-
tp://citeseerx.ist.psu.edu/viewdoc/down-
load?doi=10.1.1.92.9860&rep=rep1&type=pdf
Let's explore the impact of a range of regularization parameters using SquaredL2Up-
dater :
val regResults = Seq(0.001, 0.01, 0.1, 1.0, 10.0).map {
param =>
val model = trainWithParams(scaledDataCats, param,
numIterations, new SquaredL2Updater, 1.0)
createMetrics(s"$param L2 regularization parameter",
scaledDataCats, model)
}
Search WWH ::




Custom Search