Database Reference
In-Depth Information
Finally, we will create a second helper function to take the input data and a classification
model and generate the relevant AUC metrics:
def createMetrics(label: String, data: RDD[LabeledPoint],
model: ClassificationModel) = {
val scoreAndLabels = data.map { point =>
(model.predict(point.features), point.label)
}
val metrics = new
BinaryClassificationMetrics(scoreAndLabels)
(label, metrics.areaUnderROC)
}
We will also cache our scaled dataset, including categories, to speed up the multiple mod-
el training runs that we will be using to explore these different parameter settings:
scaledDataCats.cache
Iterations
Many machine learning methods are iterative in nature, converging to a solution (the op-
timal weight vector that minimizes the chosen loss function) over a number of iteration
steps. SGD typically requires relatively few iterations to converge to a reasonable solution
but can be run for more iterations to improve the solution. We can see this by trying a few
different settings for the numIterations parameter and comparing the AUC results:
val iterResults = Seq(1, 5, 10, 50).map { param =>
val model = trainWithParams(scaledDataCats, 0.0, param,
new SimpleUpdater, 1.0)
createMetrics(s"$param iterations", scaledDataCats, model)
}
iterResults.foreach { case (param, auc) =>
println(f"$param, AUC = ${auc * 100}%2.2f%%") }
Your output should look like this:
1 iterations, AUC = 64.97%
5 iterations, AUC = 66.62%
10 iterations, AUC = 66.55%
50 iterations, AUC = 66.81%
Search WWH ::




Custom Search