Recommendation Systems - Mining of Massive Datasets

Database Reference

In-Depth Information

a movie with mostly actors the user doesn't like. In that case, the cosine will be a large

negative fraction, and the angle between the two vectors will be close to 180 degrees - the

maximum possible cosine distance.

□

9.2.7

Classification Algorithms

A completely different approach to a recommendation system using item profiles and util-

ity matrices is to treat the problem as one of machine learning. Regard the given data as a

training set, and for each user, build a classifier that predicts the rating of all items. There

are a great number of different classifiers, and it is not our purpose to teach this subject

here. However, you should be aware of the option of developing a classifier for recom-

mendation, so we shall discuss one common classifier - decision trees - briefly.

A decision tree is a collection of nodes, arranged as a binary tree. The leaves render de-

cisions; in our case, the decision would be “likes” or “doesn't like.” Each interior node is

a condition on the objects being classified; in our case the condition would be a predicate

involving one or more features of an item.

To classify an item, we start at the root, and apply the predicate at the root to the item. If

the predicate is true, go to the left child, and if it is false, go to the right child. Then repeat

the same process at the node visited, until a leaf is reached. That leaf classifies the item as

liked or not.

Construction of a decision tree requires selection of a predicate for each interior node.

There are many ways of picking the best predicate, but they all try to arrange that one of the

children gets all or most of the positive examples in the training set (i.e, the items that the

given user likes, in our case) and the other child gets all or most of the negative examples

(the items this user does not like).

Once we have selected a predicate for a node N , we divide the items into the two groups:

those that satisfy the predicate and those that do not. For each group, we again find the pre-

dicate that best separates the positive and negative examples in that group. These predic-

ates are assigned to the children of N . This process of dividing the examples and building

children can proceed to any number of levels. We can stop, and create a leaf, if the group

of items for a node is homogeneous; i.e., they are all positive or all negative examples.

However, we may wish to stop and create a leaf with the majority decision for a group,

even if the group contains both positive and negative examples. The reason is that the stat-

istical significance of a small group may not be high enough to rely on. For that reason a

variant strategy is to create an ensemble of decision trees, each using different predicates,

but allow the trees to be deeper than what the available data justifies. Such trees are called

overfitted . To classify an item, apply all the trees in the ensemble, and let them vote on the

Search WWH ::

Custom Search

Home