Databases Reference
In-Depth Information
which a user will like an item by the closeness of the item's profile to the
user's profile.
F
Classification of Items: An alternative to constructing a user profile is to
build a classifier for each user, e.g., a decision tree. The row of the utility
matrix for that user becomes the training data, and the classifier must
predict the response of the user to all items, whether or not the row had
an entry for that item.
F
Similarity of Rows and Columns of the Utility Matrix : Collaborative fil-
tering algorithms must measure the similarity of rows and/or columns
of the utility matrix. Jaccard distance is appropriate when the matrix
consists only of 1's and blanks (for “not rated”). Cosine distance works
for more general values in the utility matrix. It is often useful to normal-
ize the utility matrix by subtracting the average value (either by row, by
column, or both) before measuring the cosine distance.
F
Clustering Users and Items: Since the utility matrix tends to be mostly
blanks, distance measures such as Jaccard or cosine often have too little
data with which to compare two rows or two columns. A preliminary
step or steps, in which similarity is used to cluster users and/or items
into small groups with strong similarity, can help provide more common
components with which to compare rows or columns.
F
UV-Decomposition: One way of predicting the blank values in a utility
matrix is to find two long, thin matrices U and V , whose product is an
approximation to the given utility matrix. Since the matrix product U V
gives values for all user-item pairs, that value can be used to predict the
value of a blank in the utility matrix. The intuitive reason this method
makes sense is that often there are a relatively small number of issues (that
number is the “thin” dimension of U and V ) that determine whether or
not a user likes an item.
F
Root-Mean-Square Error : A good measure of how close the product U V
is to the given utility matrix is the RMSE (root-mean-square error). The
RMSE is computed by averaging the square of the differences between
U V and the utility matrix, in those elements where the utility matrix is
nonblank. The square root of this average is the RMSE.
F
Computing U and V : One way of finding a good choice for U and V in a
UV-decomposition is to start with arbitrary matrices U and V . Repeat-
edly adjust one of the elements of U or V to minimize the RMSE between
the product U V and the given utility matrix. The process converges to
a local optimum, although to have a good chance of obtaining a global
optimum we must either repeat the process from many starting matrices,
or search from the starting point in many different ways.
Search WWH ::




Custom Search