Databases Reference
In-Depth Information
Rounding the Data
We could try to eliminate the apparent similarity between movies a user rates
highly and those with low scores by rounding the ratings. For instance, we could
consider ratings of 3, 4, and 5 as a “1” and consider ratings 1 and 2 as unrated.
The utility matrix would then look as in Fig. 9.5. Now, the Jaccard distance
between A and B is 3/4, while between A and C it is 1; i.e., C appears further
from A than B does, which is intuitively correct. Applying cosine distance to
Fig. 9.5 allows us to draw the same conclusion.
HP1
HP2
HP3
TW
SW1
SW2
SW3
A
1
1
B
1
1
1
C
1
1
D
1
1
Figure 9.5: Utilities of 3, 4, and 5 have been replaced by 1, while ratings of 1
and 2 are omitted
Normalizing Ratings
If we normalize ratings, by subtracting from each rating the average rating
of that user, we turn low ratings into negative numbers and high ratings into
positive numbers. If we then take the cosine distance, we find that users with
opposite views of the movies they viewed in common will have vectors in almost
opposite directions, and can be considered as far apart as possible. However,
users with similar opinions about the movies rated in common will have a
relatively small angle between them.
Example 9.9 : Figure 9.6 shows the matrix of Fig. 9.4 with all ratings nor-
malized. An interesting effect is that D's ratings have effectively disappeared,
because a 0 is the same as a blank when cosine distance is computed. Note that
D gave only 3's and did not differentiate among movies, so it is quite possible
that D's opinions are not worth taking seriously.
HP1
HP2
HP3
TW
SW1
SW2
SW3
A
2/3
5/3 −7/3
B
1/3
1/3 −2/3
C
−5/3
1/3
4/3
D
0
0
Figure 9.6: The utility matrix introduced in Fig. 9.1
Search WWH ::




Custom Search