Recommendation Systems - Mining of Massive Datasets

Database Reference

In-Depth Information

entries in that row that are blank but have a high estimated value. There is a tradeoff re-

garding whether we should work from similar users or similar items.

• If we find similar users, then we only have to do the process once for user U . From

the set of similar users we can estimate all the blanks in the utility matrix for U . If

we work from similar items, we have to compute similar items for almost all items,

before we can estimate the row for U .

• On the other hand, item-item similarity often provides more reliable information,

because of the phenomenon observed above, namely that it is easier to find items

of the same genre than it is to find users that like only items of a single genre.

Whichever method we choose, we should precompute preferred items for each user, rather

than waiting until we need to make a decision. Since the utility matrix evolves slowly, it

is generally sufficient to compute it infrequently and assume that it remains fixed between

recomputations.

9.3.3

Clustering Users and Items

It is hard to detect similarity among either items or users, because we have little information

about user-item pairs in the sparse utility matrix. In the perspective of Section 9.3.2 , even

if two items belong to the same genre, there are likely to be very few users who bought

or rated both. Likewise, even if two users both like a genre or genres, they may not have

bought any items in common.

One way of dealing with this pitfall is to cluster items and/or users. Select any of the

distance measures suggested in Section 9.3.1 , or any other distance measure, and use it to

perform a clustering of, say, items. Any of the methods suggested in Chapter 7 can be used.

However, we shall see that there may be little reason to try to cluster into a small number

of clusters immediately. Rather, a hierarchical approach, where we leave many clusters un-

merged may suffice as a first step. For example, we might leave half as many clusters as

there are items.

EXAMPLE 9.10 Figure 9.7 shows what happens to the utility matrix of Fig. 9.4 if we manage

to cluster the three Harry-Potter movies into one cluster, denoted HP, and also cluster the

three Star-Wars movies into one cluster SW.

□

Figure 9.7 Utility matrix for users and clusters of items

Search WWH ::

Custom Search

Home