Database Reference
In-Depth Information
out fold. This is repeated K times, and the results are averaged to give the cross-validation
score. The train-test split is effectively like two-fold cross-validation.
Other approaches include leave-one-out cross-validation and random sampling. See the
article at http://en.wikipedia.org/wiki/Cross-validation_(statistics) for further details.
First, we will split our dataset into a 60 percent training set and a 40 percent test set (we
will use a constant random seed of 123 here to ensure that we get the same results for ease
of illustration):
val trainTestSplit = scaledDataCats.randomSplit(Array(0.6,
0.4), 123)
val train = trainTestSplit(0)
val test = trainTestSplit(1)
Next, we will compute the evaluation metric of interest (again, we will use AUC) for a
range of regularization parameter settings. Note that here we will use a finer-grained step
size between the evaluated regularization parameters to better illustrate the differences in
AUC, which are very small in this case:
val regResultsTest = Seq(0.0, 0.001, 0.0025, 0.005,
0.01).map { param =>
val model = trainWithParams( train , param, numIterations,
new SquaredL2Updater, 1.0)
createMetrics(s"$param L2 regularization parameter",
test, model)
}
regResultsTest.foreach { case (param, auc) =>
println(f"$param, AUC = ${auc * 100}%2.6f%%")
}
This will compute the results of training on the training set and the results of evaluating
on the test set, as shown here:
0.0 L2 regularization parameter, AUC = 66.480874%
0.001 L2 regularization parameter, AUC = 66.480874%
0.0025 L2 regularization parameter, AUC = 66.515027%
0.005 L2 regularization parameter, AUC = 66.515027%
0.01 L2 regularization parameter, AUC = 66.549180%
Search WWH ::




Custom Search