Database Reference
In-Depth Information
distance of vectors is not affected by components in which both vectors have 0, we need
not worry about the effect of actors that are in neither movie.
The last component shown represents the average rating. We have shown it as having an
unknown scaling factor α . In terms of α , we can compute the cosine of the angle between
the vectors. The dot product is 2 + 12 α 2 , and the lengths of the vectors are
Thus,
the cosine of the angle between the vectors is
If we choose α = 1, that is, we take the average ratings as they are, then the value of the
above expression is 0.816. If we use α = 2, that is, we double the ratings, then the cosine is
0.940. That is, the vectors appear much closer in direction than if we use α = 1. Likewise, if
we use α = 1 / 2, then the cosine is 0.619, making the vectors look quite different. We cannot
tell which value of α is “right,” but we see that the choice of scaling factor for numerical
features affects our decision about how similar items are.
9.2.5
User Profiles
We not only need to create vectors describing items; we need to create vectors with the
same components that describe the user's preferences. We have the utility matrix repres-
enting the connection between users and items. Recall the nonblank matrix entries could
be just 1s representing user purchases or a similar connection, or they could be arbitrary
numbers representing a rating or degree of affection that the the user has for the item.
With this information, the best estimate we can make regarding which items the user
likes is some aggregation of the profiles of those items. If the utility matrix has only 1s,
then the natural aggregate is the average of the components of the vectors representing the
item profiles for the items in which the utility matrix has 1 for that user.
EXAMPLE 9.3 Suppose items are movies, represented by boolean profiles with components
corresponding to actors. Also, the utility matrix has a 1 if the user has seen the movie and
is blank otherwise. If 20% of the movies that user U likes have Julia Roberts as one of the
actors, then the user profile for U will have 0.2 in the component for Julia Roberts.
If the utility matrix is not boolean, e.g., ratings 1-5, then we can weight the vectors rep-
resenting the profiles of items by the utility value. It makes sense to normalize the utilities
by subtracting the average value for a user. That way, we get negative weights for items
with a below-average rating, and positive weights for items with above-average ratings.
That effect will prove useful when we discuss in Section 9.2.6 how to find items that a user
should like.
Search WWH ::




Custom Search