Building a Classification Model with Spark - Machine Learning with Spark

Database Reference

In-Depth Information

Finally, we will create a second helper function to take the input data and a classification

model and generate the relevant AUC metrics:

def createMetrics(label: String, data: RDD[LabeledPoint],

model: ClassificationModel) = {

val scoreAndLabels = data.map { point =>

(model.predict(point.features), point.label)

}

val metrics = new

BinaryClassificationMetrics(scoreAndLabels)

(label, metrics.areaUnderROC)

}

We will also cache our scaled dataset, including categories, to speed up the multiple mod-

el training runs that we will be using to explore these different parameter settings:

scaledDataCats.cache

Iterations

Many machine learning methods are iterative in nature, converging to a solution (the op-

timal weight vector that minimizes the chosen loss function) over a number of iteration

steps. SGD typically requires relatively few iterations to converge to a reasonable solution

but can be run for more iterations to improve the solution. We can see this by trying a few

different settings for the numIterations parameter and comparing the AUC results:

val iterResults = Seq(1, 5, 10, 50).map { param =>

val model = trainWithParams(scaledDataCats, 0.0, param,

new SimpleUpdater, 1.0)

createMetrics(s"$param iterations", scaledDataCats, model)

}

iterResults.foreach { case (param, auc) =>

println(f"$param, AUC = ${auc * 100}%2.2f%%") }

Your output should look like this:

1 iterations, AUC = 64.97%

5 iterations, AUC = 66.62%

10 iterations, AUC = 66.55%

50 iterations, AUC = 66.81%

Search WWH ::

Custom Search

Home