Databases Reference
In-Depth Information
but its small size will make clear some of the pitfalls in picking a distance
measure. Observe specifically the users A and C. They rated two movies in
common, but they appear to have almost diametrically opposite opinions of
these movies. We would expect that a good distance measure would make
them rather far apart. Here are some alternative measures to consider.
HP1
HP2
HP3
TW
SW1
SW2
SW3
A
4
5
1
B
5
5
4
C
2
4
5
D
3
3
Figure 9.4: The utility matrix introduced in Fig. 9.1
Jaccard Distance
We could ignore values in the matrix and focus only on the sets of items rated.
If the utility matrix only reflected purchases, this measure would be a good
one to choose. However, when utilities are more detailed ratings, the Jaccard
distance loses important information.
Example 9.7 : A and B have an intersection of size 1 and a union of size 5.
Thus, their Jaccard similarity is 1/5, and their Jaccard distance is 4/5; i.e.,
they are very far apart. In comparison, A and C have a Jaccard similarity of
2/4, so their Jaccard distance is the same, 1/2. Thus, A appears closer to C
than to B. Yet that conclusion seems intuitively wrong. A and C disagree on
the two movies they both watched, while A and B seem both to have liked the
one movie they watched in common.
2
Cosine Distance
We can treat blanks as a 0 value. This choice is questionable, since it has the
effect of treating the lack of a rating as more similar to disliking the movie than
liking it.
Example 9.8 : The cosine of the angle between A and B is
4×5
= 0.386
4 2 + 5 2 + 1 2
5 2 + 5 2 + 4 2
The cosine of the angle between A and C is
5×2 + 1×4
= 0.322
4 2 + 5 2 + 1 2
2 2 + 4 2 + 5 2
This measure too tells us A is more similar to C than to B, a conclusion that
defies our intuition.
2
Search WWH ::




Custom Search