Database Reference
In-Depth Information
entries in that row that are blank but have a high estimated value. There is a tradeoff re-
garding whether we should work from similar users or similar items.
• If we find similar users, then we only have to do the process once for user U . From
the set of similar users we can estimate all the blanks in the utility matrix for U . If
we work from similar items, we have to compute similar items for almost all items,
before we can estimate the row for U .
• On the other hand, item-item similarity often provides more reliable information,
because of the phenomenon observed above, namely that it is easier to find items
of the same genre than it is to find users that like only items of a single genre.
Whichever method we choose, we should precompute preferred items for each user, rather
than waiting until we need to make a decision. Since the utility matrix evolves slowly, it
is generally sufficient to compute it infrequently and assume that it remains fixed between
recomputations.
9.3.3
Clustering Users and Items
It is hard to detect similarity among either items or users, because we have little information
about user-item pairs in the sparse utility matrix. In the perspective of Section 9.3.2 , even
if two items belong to the same genre, there are likely to be very few users who bought
or rated both. Likewise, even if two users both like a genre or genres, they may not have
bought any items in common.
One way of dealing with this pitfall is to cluster items and/or users. Select any of the
distance measures suggested in Section 9.3.1 , or any other distance measure, and use it to
perform a clustering of, say, items. Any of the methods suggested in Chapter 7 can be used.
However, we shall see that there may be little reason to try to cluster into a small number
of clusters immediately. Rather, a hierarchical approach, where we leave many clusters un-
merged may suffice as a first step. For example, we might leave half as many clusters as
there are items.
EXAMPLE 9.10 Figure 9.7 shows what happens to the utility matrix of Fig. 9.4 if we manage
to cluster the three Harry-Potter movies into one cluster, denoted HP, and also cluster the
three Star-Wars movies into one cluster SW.
Figure 9.7 Utility matrix for users and clusters of items
Search WWH ::




Custom Search