Recommendation Systems - Mining of Massive Datasets

Database Reference

In-Depth Information

9.3.2

The Duality of Similarity

The utility matrix can be viewed as telling us about users or about items, or both. It is im-

portant to realize that any of the techniques we suggested in Section 9.3.1 for finding sim-

ilar users can be used on columns of the utility matrix to find similar items. There are two

ways in which the symmetry is broken in practice.

(1) We can use information about users to recommend items. That is, given a user, we can

find some number of the most similar users, perhaps using the techniques of Chapter

3 . We can base our recommendation on the decisions made by these similar users, e.g.,

recommend the items that the greatest number of them have purchased or rated highly.

However, there is no symmetry. Even if we find pairs of similar items, we need to take

an additional step in order to recommend items to users. This point is explored further

at the end of this subsection.

(2) There is a difference in the typical behavior of users and items, as it pertains to similar-

ity. Intuitively, items tend to be classifiable in simple terms. For example, music tends

to belong to a single genre. It is impossible, e.g., for a piece of music to be both 60's

rock and 1700's baroque. On the other hand, there are individuals who like both 60's

rock and 1700's baroque, and who buy examples of both types of music. The conse-

quence is that it is easier to discover items that are similar because they belong to the

same genre, than it is to detect that two users are similar because they prefer one genre

in common, while each also likes some genres that the other doesn't care for.

As we suggested in (1) above, one way of predicting the value of the utility-matrix entry

for user U and item I is to find the n users (for some predetermined n ) most similar to U and

average their ratings for item I , counting only those among the n similar users who have

rated I . It is generally better to normalize the matrix first. That is, for each of the n users

subtract their average rating for items from their rating for i . Average the difference for

those users who have rated I , and then add this average to the average rating that U gives

for all items. This normalization adjusts the estimate in the case that U tends to give very

high or very low ratings, or a large fraction of the similar users who rated I (of which there

may be only a few) are users who tend to rate very high or very low.

Dually, we can use item similarity to estimate the entry for user U and item I . Find the m

items most similar to I , for some m , and take the average rating, among the m items, of the

ratings that U has given. As for user-user similarity, we consider only those items among

the m that U has rated, and it is probably wise to normalize item ratings first.

Note that whichever approach to estimating entries in the utility matrix we use, it is not

sufficient to find only one entry. In order to recommend items to a user U , we need to es-

timate every entry in the row of the utility matrix for U , or at least find all or most of the

Search WWH ::

Custom Search

Home