Database Reference
In-Depth Information
first feature and dividing the result by the square root of the variance (which we computed
earlier):
println((0.789131 - 0.41225805299526636)/
math.sqrt(0.1097424416755897))
The result should be equal to the first element of our scaled vector:
1.137647336497682
We can now retrain our model using the standardized data. We will use only the logistic
regression model to illustrate the impact of feature standardization (since the decision tree
and naïve Bayes are not impacted by this):
val lrModelScaled =
LogisticRegressionWithSGD.train(scaledData, numIterations)
val lrTotalCorrectScaled = scaledData.map { point =>
if (lrModelScaled.predict(point.features) == point.label)
1 else 0
}.sum
val lrAccuracyScaled = lrTotalCorrectScaled / numData
val lrPredictionsVsTrue = scaledData.map { point =>
(lrModelScaled.predict(point.features), point.label)
}
val lrMetricsScaled = new
BinaryClassificationMetrics(lrPredictionsVsTrue)
val lrPr = lrMetricsScaled.areaUnderPR
val lrRoc = lrMetricsScaled.areaUnderROC
println(f"${lrModelScaled.getClass.getSimpleName}\nAccuracy:
${lrAccuracyScaled * 100}%2.4f%%\nArea under PR: ${lrPr *
100.0}%2.4f%%\nArea under ROC: ${lrRoc * 100.0}%2.4f%%")
The result should look similar to this:
LogisticRegressionModel
Accuracy: 62.0419%
Area under PR: 72.7254%
Area under ROC: 61.9663%
Search WWH ::




Custom Search