Database Reference
In-Depth Information
Evaluating k for SVD on the LFW dataset
We will examine the singular values obtained from computing the SVD on our image data.
We can verify that the singular values are the same for each run and that they are returned
in decreasing order, as follows:
val sValues = (1 to 5).map { i => matrix.computeSVD(i,
computeU = false).s }
sValues.foreach(println)
This should show us output similar to the following:
[54091.00997110354]
[54091.00997110358,33757.702867982436]
[54091.00997110357,33757.70286798241,24541.193694775946]
[54091.00997110358,33757.70286798242,24541.19369477593,23309.58418888302]
[54091.00997110358,33757.70286798242,24541.19369477593,23309.584188882982,21803.09841158358]
As with evaluating values of k for clustering, in the case of SVD (and PCA), it is often use-
ful to plot the singular values for a larger range of k and see where the point on the graph is
where the amount of additional variance accounted for by each additional singular value
starts to flatten out considerably.
We will do this by first computing the top 300 singular values:
val svd300 = matrix.computeSVD(300, computeU = false)
val sMatrix = new DenseMatrix(1, 300, svd300.s.toArray)
csvwrite(new File("/tmp/s.csv"), sMatrix)
We will write out the vector S of singular values to a temporary CSV file (as we did for our
matrix of Eigenfaces previously) and then read it back in our IPython console, plotting the
singular values for each k :
s = np.loadtxt("/tmp/s.csv", delimiter=",")
print(s.shape)
plot(s)
You should see an image displayed similar to the one shown here:
Search WWH ::




Custom Search