Database Reference
In-Depth Information
was cited as an example. While one cannot tease out of the data information about
how long was the delay between viewing and rating, it is generally safe to assume
that most people see a movie shortly after it comes out. Thus, one can examine the
ratings of any movie to see if its ratings have an upward or downward slope with
time.
9.6 Summary of Chapter 9
Utility Matrices : Recommendation systems deal with users and items. A utility matrix offers known information
about the degree to which a user likes an item. Normally, most entries are unknown, and the essential problem of re-
commending items to users is predicting the values of the unknown entries based on the values of the known entries.
Two Classes of Recommendation Systems : These systems attempt to predict a user's response to an item by discov-
ering similar items and the response of the user to those. One class of recommendation system is content-based; it
measures similarity by looking for common features of the items. A second class of recommendation system uses
collaborative filtering; these measure similarity of users by their item preferences and/or measure similarity of items
by the users who like them.
Item Profiles : These consist of features of items. Different kinds of items have different features on which content-
based similarity can be based. Features of documents are typically important or unusual words. Products have attrib-
utes such as screen size for a television. Media such as movies have a genre and details such as actor or performer.
Tags can also be used as features if they can be acquired from interested users.
User Profiles : A content-based collaborative filtering system can construct profiles for users by measuring the fre-
quency with which features appear in the items the user likes. We can then estimate the degree to which a user will
like an item by the closeness of the item's profile to the user's profile.
Classification of Items : An alternative to constructing a user profile is to build a classifier for each user, e.g., a de-
cision tree. The row of the utility matrix for that user becomes the training data, and the classifier must predict the
response of the user to all items, whether or not the row had an entry for that item.
Similarity of Rows and Columns of the Utility Matrix : Collaborative filtering algorithms must measure the similarity
of rows and/or columns of the utility matrix. Jaccard distance is appropriate when the matrix consists only of 1s and
blanks (for “not rated”). Cosine distance works for more general values in the utility matrix. It is often useful to
normalize the utility matrix by subtracting the average value (either by row, by column, or both) before measuring
the cosine distance.
Clustering Users and Items : Since the utility matrix tends to be mostly blanks, distance measures such as Jaccard
or cosine often have too little data with which to compare two rows or two columns. A preliminary step or steps, in
which similarity is used to cluster users and/or items into small groups with strong similarity, can help provide more
common components with which to compare rows or columns.
UV-Decomposition : One way of predicting the blank values in a utility matrix is to find two long, thin matrices U
and V , whose product is an approximation to the given utility matrix. Since the matrix product UV gives values for
all user-item pairs, that value can be used to predict the value of a blank in the utility matrix. The intuitive reason this
method makes sense is that often there are a relatively small number of issues (that number is the “thin” dimension
of U and V ) that determine whether or not a user likes an item.
Root-Mean-Square Error : A good measure of how close the product UV is to the given utility matrix is the RMSE
(root-mean-square error). The RMSE is computed by averaging the square of the differences between UV and the
utility matrix, in those elements where the utility matrix is nonblank. The square root of this average is the RMSE.
Computing U and V : One way of finding a good choice for U and V in a UV-decomposition is to start with arbitrary
matrices U and V . Repeatedly adjust one of the elements of U or V to minimize the RMSE between the product UV
Search WWH ::




Custom Search