Database Reference
In-Depth Information
attributes until the leaf node is reached. The node contains a pre-computed
recommendation list and this list is returned to the user. Thus, the time
complexity is reduced to O ( h ).
Building the recommendation list at the leaf node can be done in
various ways. We selected the following solution: to compute the weighted
average of the ratings in all tuples at the leaf's Ratings set and to prefer
the items with the highest weighted average. Consider the rightmost leaf in
the example tree in Figure 16.3. For any tuple < i,u,r > in the Ratings set
( i, u, r denote an item, a user and a rating respectively). At this leaf node
we know that u has a degree of liking the comedy genre more than 0.5 and
a degree of liking movies with the actor George Clooney more than 0.7.
All the items rated by users such as u appear in this Ratings set alongside
the ratings each user submitted. This leaf therefore contains the ratings of
users similar to u . If we now pick the items that were rated the highest by
the users similar to u , these would form a good recommendation for this
user. Thus, we order all items based on a weighted average (since items
can appear more than once, when more than one user rated them) and set
this list as the recommendation list for this leaf node. The algorithm is
presented in detail in Figure 16.4.
16.2.2
Least Probable Intersections
Basedonthemodelconstructedwiththe training set, decision trees seek
to provide good results for new unseen cases. To accomplish this, the
construction algorithm strives for a small tree (in terms of the number of
nodes) that performs well on the training set. It is believed that a small tree
generalizes better, avoids over fitting, and forms a simpler representation
for users to understand. The C4.5 is a popular algorithm for constructing
such decision trees. It employs the criterion of normalized information gain
to pick the attribute which is used to split each node. The attribute with
the largest normalized information gain is picked, as it provides the largest
reduction in entropy. Since this is a heuristic, it does not guarantee the
smallest possible tree.
In RS the input set for building the decision tree is composed of Ratings .
Ratings can be described as a relation < ItemID , UserID , Rating > (in
which < ItemID , UserID > is assumed to be a primary key). The intuition
behind the heuristic proposed in this chapter is as follows. Consider a split
into two subsets, A and B .Theless ItemIDs are shared between the sets A
and B , the better the split is since it forms a better distinction between
Search WWH ::




Custom Search