Database Reference
In-Depth Information
Next, let us apply this formula to a matrix M whose SVD is M = U Σ V T . Let the i th diag-
onal element of Σ be σ i , and suppose we preserve the first n of the r diagonal elements of Σ,
setting the rest to 0. Let Σ′ be the resulting diagonal matrix. Let M ′ = U Σ′ V T be the resulting
approximation to M . Then M M ′ = U (Σ − Σ′) V T is the matrix giving the errors that result
from our approximation.
If we apply Equation 11.5 to the matrix M M ′, we see that M M 2 equals the sum
of the squares of the diagonal elements of Σ − Σ′. But Σ − Σ′ has 0 for the first n diagon-
al elements and σ i for the i th diagonal element, where n < i r . That is, M M 2 is
the sum of the squares of the elements of Σ that were set to 0. To minimize M M 2 ,
pick those elements to be the smallest in Σ. Doing so gives the least possible value of M
M 2 under the constraint that we preserve n of the diagonal elements, and it therefore
minimizes the RMSE under the same constraint.
11.3.5
Querying Using Concepts
In this section we shall look at how SVD can help us answer certain queries efficiently, with
good accuracy. Let us assume for example that we have decomposed our original movie-
rating data (the rank-2 data of Fig. 11.6 ) into the SVD form of Fig. 11.7 . Quincy is not one
of the people represented by the original matrix, but he wants to use the system to know
what movies he would like. He has only seen one movie, The Matrix , and rated it 4. Thus,
we can represent Quincy by the vector q = [4, 0, 0, 0, 0], as if this were one of the rows of
the original matrix.
If we used a collaborative-filtering approach, we would try to compare Quincy with the
other users represented in the original matrix M . Instead, we can map Quincy into “concept
space” by multiplying him by the matrix V of the decomposition. We find q V = [2.32, 0]. 3
That is to say, Quincy is high in science-fiction interest, and not at all interested in romance.
We now have a representation of Quincy in concept space, derived from, but different
from his representation in the original “movie space.” One useful thing we can do is to
map his representation back into movie space by multiplying [2.32, 0] by V T . This product
is [1.35, 1.35, 1.35, 0, 0]. It suggests that Quincy would like Alien and Star Wars , but not
Casablanca or Titanic .
Another sort of query we can perform in concept space is to find users similar to Quincy.
We can use V to map all users into concept space. For example, Joe maps to [1.74, 0], and
Jill maps to [0, 5.68]. Notice that in this simple example, all users are either 100% science-
fiction fans or 100% romance fans, so each vector has a zero in one component. In reality,
people are more complex, and they will have different, but nonzero, levels of interest in
Search WWH ::




Custom Search