Database Reference
In-Depth Information
Figure 9.4 The utility matrix introduced in Fig. 9.1
Jaccard Distance
We could ignore values in the matrix and focus only on the sets of items rated. If the utility
matrix only reflected purchases, this measure would be a good one to choose. However,
when utilities are more detailed ratings, the Jaccard distance loses important information.
EXAMPLE 9.7 A and B have an intersection of size 1 and a union of size 5. Thus, their Jac-
card similarity is 1/5, and their Jaccard distance is 4/5; i.e., they are very far apart. In com-
parison, A and C have a Jaccard similarity of 2/4, so their Jaccard distance is the same, 1/2.
Thus, A appears closer to C than to B . Yet that conclusion seems intuitively wrong. A and
C disagree on the two movies they both watched, while A and B seem both to have liked
the one movie they watched in common.
Cosine Distance
We can treat blanks as a 0 value. This choice is questionable, since it has the effect of treat-
ing the lack of a rating as more similar to disliking the movie than liking it.
EXAMPLE 9.8 The cosine of the angle between A and B is
The cosine of the angle between A and C is
Since a larger (positive) cosine implies a smaller angle and therefore a smaller distance,
this measure tells us that A is slightly closer to B than to C .
Rounding the Data
We could try to eliminate the apparent similarity between movies a user rates highly and
those with low scores by rounding the ratings. For instance, we could consider ratings of
3, 4, and 5 as a “1” and consider ratings 1 and 2 as unrated. The utility matrix would then
look as in Fig. 9.5 . Now, the Jaccard distance between A and B is 3/4, while between A and
Search WWH ::




Custom Search