Database Reference
In-Depth Information
First, let us find the probability of an item belonging to the intersection.
The probability of all o i occurrences of any item i to be in the set A is ( q ) o i .
Similarly, the probability for all o i occurrences of an item i to be in the set
B is ( q− q ) o i . In all other cases an item i will appear in the intersection,
thus the probability P i that an item i belongs to items ( A ) items ( B )is:
k
q
o i
q
o i .
k
P i =1
(16.1)
q
Next, we can construct a random variable x i which takes the value
1whenanitem i belongs to the intersection of A and B , and otherwise
the value is 0. Using the above equation, the variable x i is distributed
according to a Bernoulli distribution with a success probability P i .Thus,
S q is distributed as the sum of
non-identically distributed
Bernoulli random variables which can be approximated by a Poisson
distribution:
|
items ( Ratings )
|
Pr ( S q = j )= λ j
e −λ
j !
·
,
(16.2)
where
λ =
i∈items ( Ratings )
P i .
(16.3)
The cumulative distribution function (CDF) is therefore given by:
k +1
)
j )= Γ(
Pr ( S q
,
(16.4)
k
!
where Γ( x, y ) is the incomplete gamma function and
k
is the floor
function.
To summarize, given a binary split of Ratings into two sub-relations A
and B , of sizes q and q
k respectively. Our proposed heuristic initially
computes the size of the item-intersection,
( items ( A ) items ( B )
= j .
Next, we compute the probability of receiving such intersection size in
a similar-size random split using the probability Pr ( S q
|
|
j ). Out of
all possible splits of Ratings , our heuristic picks the one with the lowest
probability Pr ( S q
j ) to be the next split in the tree.
16.3 Using Decision Trees for Preferences Elicitation
Most RS methods, such as collaborative filtering, cannot provide personal-
ized recommendations until a user profile has been created. This is known as
the new user cold-start problem. Several systems try to learn the new users'
Search WWH ::




Custom Search