Database Reference
In-Depth Information
val label = trimmed(r.size - 1).toInt
val categoryIdx = categories(r(3))
val categoryFeatures = Array.ofDim[Double](numCategories)
categoryFeatures(categoryIdx) = 1.0
val otherFeatures = trimmed.slice(4, r.size - 1).map(d =>
if (d == "?") 0.0 else d.toDouble)
val features = categoryFeatures ++ otherFeatures
LabeledPoint(label, Vectors.dense(features))
}
println(dataCategories.first)
You should see output similar to what is shown here. You can see that the first part of our
feature vector is now a vector of length 14 with one nonzero entry at the relevant category
index:
LabeledPoint(0.0,
[0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.789131,2.055555556,0.676470588,0.205882353,0.047058824,0.023529412,0.443783175,0.0,0.0,0.09077381,0.0,0.245831182,0.003883495,1.0,1.0,24.0,0.0,5424.0,170.0,8.0,0.152941176,0.079129575])
Again, since our raw features are not standardized, we should perform this transformation
using the same StandardScaler approach that we used earlier before training a new
model on this expanded dataset:
val scalerCats = new StandardScaler(withMean = true,
withStd = true).fit(dataCategories.map(lp => lp.features))
val scaledDataCats = dataCategories.map(lp =>
LabeledPoint(lp.label, scalerCats.transform(lp.features)))
We can inspect the features before and after scaling as we did earlier:
println(dataCategories.first.features)
The output is as follows:
0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.789131,2.055555556
...
The following code will print the features after scaling:
println(scaledDataCats.first.features)
Search WWH ::




Custom Search