Decision Trees and Recommender Systems - Data Mining with Decision Trees: Theory and Applications - page 258

Database Reference

In-Depth Information

Fig. 16.4 A modified decision tree for recommender systems.

the two groups. However, different splits may result in different group

sizes. Comparing the size of the intersection between splits of different

sub-relation sizes would not be sucient since an even split for example

(half the tuples in group A and half in group B ) would probably have a

larger intersection than a very uneven split (such as one tuple in group

A and all the rest in group B ). Instead, we look at the probability of our

split's item-intersection size compared to a random (uniform) split of similar

group sizes. A split which is very likely to occur even in a random split is

considered a bad split (less preferred) since it is similar to a random split

and probably does not distinguish the groups from each other. A split that

is the least probable to occur is assumed to be a better distinction of the

two subgroups and is the split selected by the heuristic.

More formally let us denote:

•

items ( S )= π ItemID ( S ), where π is the projection operation in relational

algebra, the set of all ItemIDs that appear in a set S .

•

O i ( S )=

,where σ is the select operation in relation

algebra, is the number of occurrences of the ItemID i in the set S .

|

σ ItemID = i ( S )

|

Let S q ( q denotes the number of ratings) be a random binary partition

of the tuples in Ratings into two sub-relations A and B consisting of k and

q

k tuples, respectively. We are interested in the probability distribution

of S q ≡|

−

( items ( A ) items ( B )

|

.

Next Page

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home