Database Reference
In-Depth Information
You will see the following on the screen:
[-0.023261105535492967,2.720728254208072,-0.4464200056407091,-0.2205258360869135,
...
Tip
Note that while the original raw features were sparse (that is, there are many entries that
are zero), if we subtract the mean from each entry, we would end up with a non-sparse
(dense) representation, as can be seen in the preceding example.
This is not a problem in this case as the data size is small, but often large-scale real-world
problems have extremely sparse input data with many features (online advertising and text
classification are good examples). In this case, it is not advisable to lose this sparsity, as
the memory and processing requirements for the equivalent dense representation can
quickly explode with many millions of features. We can use StandardScaler and set
withMean to false to avoid this.
We're now ready to train a new logistic regression model with our expanded feature set,
and then we will evaluate the performance:
val lrModelScaledCats =
LogisticRegressionWithSGD.train(scaledDataCats,
numIterations)
val lrTotalCorrectScaledCats = scaledDataCats.map { point =>
if (lrModelScaledCats.predict(point.features) ==
point.label) 1 else 0
}.sum
val lrAccuracyScaledCats = lrTotalCorrectScaledCats /
numData
val lrPredictionsVsTrueCats = scaledDataCats.map { point =>
(lrModelScaledCats.predict(point.features), point.label)
}
val lrMetricsScaledCats = new
BinaryClassificationMetrics(lrPredictionsVsTrueCats)
val lrPrCats = lrMetricsScaledCats.areaUnderPR
val lrRocCats = lrMetricsScaledCats.areaUnderROC
println(f"${lrModelScaledCats.getClass.getSimpleName}\nAccuracy:
${lrAccuracyScaledCats * 100}%2.4f%%\nArea under PR:
Search WWH ::




Custom Search