Database Reference
In-Depth Information
Running PCA on the LFW dataset
Now that we have extracted our image pixel data into vectors, we can instantiate a new
RowMatrix and call the computePrincipalComponents method to compute the
top K principal components of our distributed matrix:
import org.apache.spark.mllib.linalg.Matrix
import org.apache.spark.mllib.linalg.distributed.RowMatrix
val matrix = new RowMatrix(scaledVectors)
val K = 10
val pc = matrix.computePrincipalComponents(K)
You will likely see quite a lot of output in your console while the model runs.
Tip
If you see warnings such as WARN LAPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemLAPACK or WARN LAPACK: Failed to load
implementation from: com.github.fommil.netlib.NativeRefLAPACK , you can safely
ignore these.
This means that the underlying linear algebra libraries used by MLlib could not load native
routines. In this case, a Java-based fallback will be used, which is slower, but there is noth-
ing to worry about for the purposes of this example.
Once the model training is complete, you should see a result displayed in the console that
looks similar to the following one:
pc: org.apache.spark.mllib.linalg.Matrix =
-0.023183157256614906 -0.010622723054037303 ... (10 total)
-0.023960537953442107 -0.011495966728461177 ...
-0.024397470862198022 -0.013512219690177352 ...
-0.02463158818330343 -0.014758658113862178 ...
-0.024941633606137027 -0.014878858729655142 ...
-0.02525998879466241 -0.014602750644394844 ...
-0.025494722450369593 -0.014678013626511024 ...
-0.02604194423255582 -0.01439561589951032 ...
-0.025942214214865228 -0.013907665261197633 ...
-0.026151551334429365 -0.014707035797934148 ...
Search WWH ::




Custom Search