Database Reference
In-Depth Information
Having clustered items to an extent, we can revise the utility matrix so the columns rep-
resent clusters of items, and the entry for user U and cluster C is the average rating that U
gave to those members of cluster C that U did rate. Note that U may have rated none of the
cluster members, in which case the entry for U and C is still blank.
We can use this revised utility matrix to cluster users, again using the distance measure
we consider most appropriate. Use a clustering algorithm that again leaves many clusters,
e.g., half as many clusters as there are users. Revise the utility matrix, so the rows corres-
pond to clusters of users, just as the columns correspond to clusters of items. As for item-
clusters, compute the entry for a user cluster by averaging the ratings of the users in the
cluster.
Now, this process can be repeated several times if we like. That is, we can cluster the
item clusters and again merge the columns of the utility matrix that belong to one cluster.
We can then turn to the users again, and cluster the user clusters. The process can repeat
until we have an intuitively reasonable number of clusters of each kind.
Once we have clustered the users and/or items to the desired extent and computed the
cluster-cluster utility matrix, we can estimate entries in the original utility matrix as fol-
lows. Suppose we want to predict the entry for user U and item I :
(a) Find the clusters to which U and I belong, say clusters C and D , respectively.
(b) If the entry in the cluster-cluster utility matrix for C and D is something other than
blank, use this value as the estimated value for the U - I entry in the original utility mat-
rix.
(c) If the entry for C - D is blank, then use the method outlined in Section 9.3.2 to estimate
that entry by considering clusters similar to C or D . Use the resulting estimate as the
estimate for the U - I entry.
9.3.4
Exercises for Section 9.3
EXERCISE 9.3.1 Figure 9.8 is a utility matrix, representing the ratings, on a 1-5 star scale,
of eight items, a through h , by three users A , B , and C . Compute the following from the
data of this matrix.
Figure 9.8 A utility matrix for exercises
(a) Treating the utility matrix as boolean, compute the Jaccard distance between each pair
of users.
Search WWH ::




Custom Search