Database Reference
In-Depth Information
the rating of an item, the decision tree would return a weighted list
of recommended items. Thus, with just a single traverse of the tree,
recommendations can be constructed and provided to the user. This
variation of decision tree based RS is described in Section 16.2.1.
Gershman et al . (2010) also introduce a new heuristic criterion for
building the decision tree. Instead of picking the split attribute to be the
attribute which produces the largest information gain ratio, the proposed
heuristic looks at the number of shared items between the divided sets. The
split attribute which had the lowest probability of producing its number of
shared items, when compared to a random split, was picked as the split
attribute. This heuristic is described in further detail in Section 16.2.2.
A comparison study shows that the new criterion outperforms the well-
known information gain criterion in terms of predictive performance.
16.2.1
RS-Adapted Decision Tree
In recommender systems the input set for building the decision tree
is composed of Ratings . Ratings can be described as a relation <
ItemID , UserID , Rating > (in which < ItemID , UserID > is assumed to be
a primary key). The users can be described by means of various attributes,
such as the user's age, gender, occupation, etc. Other attributes can describe
the various items, for example the weight, price, dimensions etc. Rating
is the target attribute. Based on the training set, the system attempts
to predict the Rating of items the user does not have a Rating for, and
recommends to the user the items with the highest predicted Rating .
The construction of a decision tree is performed by a recursive process.
The process starts at the root node with an input set (training set). At each
node an item attribute is picked as the split attribute. For each possible
value (or set of values) child-nodes are created and the parent's set is split
between child-nodes so that each child-node receives all items that have the
appropriate value(s) that correspond to this child-node as input-set. Picking
the split-attribute is done heuristically since we cannot know which split
will produce the best tree (the tree that produces the best results for future
input), for example the popular C4.5 algorithm uses a heuristic that picks
the split that produces the largest information gain out of all possible splits.
One of the attributes is predefined as the target attribute. The recursive
process continues until all the items in the node's set share the same target
attribute value or the number of items reaches a certain threshold. Each
leaf node is assigned a label (classifying its set of items); this label is the
Search WWH ::




Custom Search