Databases Reference
In-Depth Information
Now, consider Example 9.4. There, we observed that the vector for a user
will have positive numbers for stars that tend to appear in movies the user
likes and negative numbers for stars that tend to appear in movies the user
doesn't like. Consider a movie with many stars the user likes, and only a few or
none that the user doesn't like. The cosine of the angle between the user's and
movie's vectors will be a large positive fraction. That implies an angle close to
0, and therefore a small cosine distance between the vectors.
Next, consider a movie with about as many stars the user likes as doesn't
like. In this situation, the cosine of the angle between the user and movie is
around 0, and therefore the angle between the two vectors is around 90 degrees.
Finally, consider a movie with mostly stars the user doesn't like. In that case,
the cosine will be a large negative fraction, and the angle between the two
vectors will be close to 180 degrees - the maximum possible cosine distance.
2
9.2.7
Classification Algorithms
A completely different approach to a recommendation system using item profiles
and utility matrices is to treat the problem as one of machine learning. Regard
the given data as a training set, and for each user, build a classifier that predicts
the rating of all items. There are a great number of different classifiers, and it
is not our purpose to teach this subject here. However, you should be aware
of the option of developing a classifier for recommendation, so we shall discuss
one common classifier - decision trees - briefly.
A decision tree is a collection of nodes, arranged as a binary tree. The
leaves render decisions; in our case, the decision would be “likes” or “doesn't
like.” Each interior node is a condition on the objects being classified; in our
case the condition would be a predicate involving one or more features of an
item.
To classify an item, we start at the root, and apply the predicate at the root
to the item. If the predicate is true, go to the left child, and if it is false, go to
the right child. Then repeat the same process at the node visited, until a leaf
is reached. That leaf classifies the item as liked or not.
Construction of a decision tree requires selection of a predicate for each
interior node. There are many ways of picking the best predicate, but they all
try to arrange that one of the children gets all or most of the positive examples
in the training set (i.e, the items that the given user likes, in our case) and the
other child gets all or most of the negative examples (the items this user does
not like).
Once we have selected a predicate for a node N , we divide the items into
the two groups: those that satisfy the predicate and those that do not. For
each group, we again find the predicate that best separates the positive and
negative examples in that group. These predicates are assigned to the children
That is, user vectors will tend to be much shorter than movie vectors, but only the direction
of vectors matters.
Search WWH ::




Custom Search