Database Reference
In-Depth Information
I is the K dimensional identity matrix, and a , b ,and c are real numbers.
With these settings, we have the following model for the system:
1. μ and Σ are sampled from N (0 ,aI )and IW ν ( aI ), respectively.
2. For each user m , w m is sampled randomly from a Normal distribution:
w m
N ( μ, Σ 2 )
3. For each item x m,j , y m,j is sampled randomly from a Normal distribu-
tion: y m,j ∼ N ( w m x m,j 2 ).
Let θ =(Φ ,w 1 ,w 2 , ..., w M ) represent the parameters of this system that
needs to be estimated. The joint likelihood for all the variables in the proba-
bilistic model, which includes the data and the parameters, is:
P ( D, θ )= P (Φ)
m
Φ)
j
P ( w m |
P ( y m,j |
x m,j ,w m )
(8.6)
For simplicity, we assume a , b , c ,and σ are provided to the system.
Researchers have shown that the Bayesian hierarchical modeling approach
has a statistical significant improvement over the regularized linear regression
model on several real world datasets. They observed a negative correlation
between the number of training data for a user and the improvement the
system gets. This suggests that the borrowing information from other users
has more significant improvements for users with less training data, which is as
expected. However, the strength of the correlation differs over data sets, and
the amount of training data is not the only characteristic that will influence
the final performance.
One major concern about the hierarchical Bayesian modeling approach is
the computation complexity. This problem has been addressed by exploiting
the sparsity of the data space. A fast learning algorithm has been developed
and tested on a real world dataset (480,189 users, 159,836 features, and 100
million ratings). All the user models can be learned in about 4 hours using
a single CPU PC(2GB memory, P4 3GHz), and the learned models perform
much better than regularized linear regression models. This demonstrates
that the hierarchical Bayesian modeling technique can eciently handle a
large number of users and is used in a large-scale commercial system. More
details of the fast learning algorithm is beyond the scope of this chapter, and
we refer the reader to Zhang and Koren 2007 (74) for more information.
8.5 Novelty and Redundancy Detection
Although there is an extensive body of research on adaptive information
filtering, most of it is focused on identifying relevant documents. A common
 
Search WWH ::




Custom Search