Decision Trees and Recommender Systems - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

the rating of an item, the decision tree would return a weighted list

of recommended items. Thus, with just a single traverse of the tree,

recommendations can be constructed and provided to the user. This

variation of decision tree based RS is described in Section 16.2.1.

Gershman et al . (2010) also introduce a new heuristic criterion for

building the decision tree. Instead of picking the split attribute to be the

attribute which produces the largest information gain ratio, the proposed

heuristic looks at the number of shared items between the divided sets. The

split attribute which had the lowest probability of producing its number of

shared items, when compared to a random split, was picked as the split

attribute. This heuristic is described in further detail in Section 16.2.2.

A comparison study shows that the new criterion outperforms the well-

known information gain criterion in terms of predictive performance.

16.2.1

RS-Adapted Decision Tree

In recommender systems the input set for building the decision tree

is composed of Ratings . Ratings can be described as a relation <

ItemID , UserID , Rating > (in which < ItemID , UserID > is assumed to be

a primary key). The users can be described by means of various attributes,

such as the user's age, gender, occupation, etc. Other attributes can describe

the various items, for example the weight, price, dimensions etc. Rating

is the target attribute. Based on the training set, the system attempts

to predict the Rating of items the user does not have a Rating for, and

recommends to the user the items with the highest predicted Rating .

The construction of a decision tree is performed by a recursive process.

The process starts at the root node with an input set (training set). At each

node an item attribute is picked as the split attribute. For each possible

value (or set of values) child-nodes are created and the parent's set is split

between child-nodes so that each child-node receives all items that have the

appropriate value(s) that correspond to this child-node as input-set. Picking

the split-attribute is done heuristically since we cannot know which split

will produce the best tree (the tree that produces the best results for future

input), for example the popular C4.5 algorithm uses a heuristic that picks

the split that produces the largest information gain out of all possible splits.

One of the attributes is predefined as the target attribute. The recursive

process continues until all the items in the node's set share the same target

attribute value or the number of items reaches a certain threshold. Each

leaf node is assigned a label (classifying its set of items); this label is the

Search WWH ::

Custom Search

Home