Database Reference
In-Depth Information
(b) Repeat Part (a), but use the cosine distance.
(c) Treat ratings of 3, 4, and 5 as 1 and 1, 2, and blank as 0. Compute the Jaccard distance
between each pair of users.
(d) Repeat Part (c), but use the cosine distance.
(e) Normalize the matrix by subtracting from each nonblank entry the average value for
its user.
(f) Using the normalized matrix from Part (e), compute the cosine distance between each
pair of users.
EXERCISE 9.3.2 In this exercise, we cluster items in the matrix of Fig. 9.8 . Do the following
steps.
(a) Cluster the eight items hierarchically into four clusters. The following method should
be used to cluster. Replace all 3s, 4s, and 5s by 1 and replace 1s, 2s, and blanks by
0. use the Jaccard distance to measure the distance between the resulting column vec-
tors. For clusters of more than one element, take the distance between clusters to be
the minimum distance between pairs of elements, one from each cluster.
(b) Then, construct from the original matrix of Fig. 9.8 a new matrix whose rows corres-
pond to users, as before, and whose columns correspond to clusters. Compute the entry
for a user and cluster of items by averaging the nonblank entries for that user and all
the items in the cluster.
(c) Compute the cosine distance between each pair of users, according to your matrix from
Part (b).
9.4 Dimensionality Reduction
An entirely different approach to estimating the blank entries in the utility matrix is to con-
jecture that the utility matrix is actually the product of two long, thin matrices. This view
makes sense if there are a relatively small set of features of items and users that determine
the reaction of most users to most items. In this section, we sketch one approach to discov-
ering two such matrices; the approach is called “UV-decomposition,” and it is an instance
of a more general theory called SVD ( singular-value decomposition ).
Search WWH ::




Custom Search