Database Reference
In-Depth Information
at RDDFunctions.scala:111, took 0.495859 s
scaler: org.apache.spark.mllib.feature.StandardScalerModel
=
org.apache.spark.mllib.feature.StandardScalerModel@6bb1a1a1
Tip
Note that subtracting the mean works for dense input data. However, for sparse vectors,
subtracting the mean vector from each input will transform the sparse data into dense data.
For very high-dimensional input, this will likely exhaust the available memory resources,
so it is not advisable.
Finally, we will use the returned scaler to transform the raw image vectors to vectors
with the column means subtracted:
val scaledVectors = vectors.map(v => scaler.transform(v))
We mentioned earlier that the resized grayscale images would take up around 10 MB of
memory. Indeed, you can take a look at the memory usage in the Spark application monit-
or storage page by going to http://localhost:4040/storage/ in your web
browser.
Since we gave our RDD of image vectors a friendly name of image-vectors , you
should see something like the following screenshot (note that as we are using Vect-
or[Double] , each element takes up 8 bytes instead of 4 bytes; hence, we actually use
20 MB of memory):
Size of image vectors in memory
Search WWH ::




Custom Search