Decision Trees and Recommender Systems - Data Mining with Decision Trees: Theory and Applications

Database Reference

In-Depth Information

First, let us find the probability of an item belonging to the intersection.

The probability of all o i occurrences of any item i to be in the set A is ( q ) o i .

Similarly, the probability for all o i occurrences of an item i to be in the set

B is ( q− q ) o i . In all other cases an item i will appear in the intersection,

thus the probability P i that an item i belongs to items ( A ) items ( B )is:

o i

o i .

−

P i =1

−

(16.1)

Next, we can construct a random variable x i which takes the value

1whenanitem i belongs to the intersection of A and B , and otherwise

the value is 0. Using the above equation, the variable x i is distributed

according to a Bernoulli distribution with a success probability P i .Thus,

S q is distributed as the sum of

non-identically distributed

Bernoulli random variables which can be approximated by a Poisson

distribution:

items ( Ratings )

Pr ( S q = j )= λ j

e −λ

j !

(16.2)

where

λ =

i∈items ( Ratings )

P i .

(16.3)

The cumulative distribution function (CDF) is therefore given by:

k +1

,λ )

j )= Γ(

Pr ( S q ≤

(16.4)

where Γ( x, y ) is the incomplete gamma function and

is the floor

function.

To summarize, given a binary split of Ratings into two sub-relations A

and B , of sizes q and q

k respectively. Our proposed heuristic initially

computes the size of the item-intersection,

−

( items ( A ) items ( B )

= j .

Next, we compute the probability of receiving such intersection size in

a similar-size random split using the probability Pr ( S q ≤

j ). Out of

all possible splits of Ratings , our heuristic picks the one with the lowest

probability Pr ( S q ≤

j ) to be the next split in the tree.

16.3 Using Decision Trees for Preferences Elicitation

Most RS methods, such as collaborative filtering, cannot provide personal-

ized recommendations until a user profile has been created. This is known as

the new user cold-start problem. Several systems try to learn the new users'

Data Mining with Decision Trees: Theory and Applications

Search WWH ::

Custom Search

Home