Adaptive Information Filtering - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

I is the K dimensional identity matrix, and a , b ,and c are real numbers.

With these settings, we have the following model for the system:

1. μ and Σ are sampled from N (0 ,aI )and IW ν ( aI ), respectively.

2. For each user m , w m is sampled randomly from a Normal distribution:

w m ∼

N ( μ, Σ 2 )

3. For each item x m,j , y m,j is sampled randomly from a Normal distribu-

tion: y m,j ∼ N ( w m x m,j ,σ 2 ).

Let θ =(Φ ,w 1 ,w 2 , ..., w M ) represent the parameters of this system that

needs to be estimated. The joint likelihood for all the variables in the proba-

bilistic model, which includes the data and the parameters, is:

P ( D, θ )= P (Φ)

m

Φ)

j

P ( w m |

P ( y m,j |

x m,j ,w m )

(8.6)

For simplicity, we assume a , b , c ,and σ are provided to the system.

Researchers have shown that the Bayesian hierarchical modeling approach

has a statistical significant improvement over the regularized linear regression

model on several real world datasets. They observed a negative correlation

between the number of training data for a user and the improvement the

system gets. This suggests that the borrowing information from other users

has more significant improvements for users with less training data, which is as

expected. However, the strength of the correlation differs over data sets, and

the amount of training data is not the only characteristic that will influence

the final performance.

One major concern about the hierarchical Bayesian modeling approach is

the computation complexity. This problem has been addressed by exploiting

the sparsity of the data space. A fast learning algorithm has been developed

and tested on a real world dataset (480,189 users, 159,836 features, and 100

million ratings). All the user models can be learned in about 4 hours using

a single CPU PC(2GB memory, P4 3GHz), and the learned models perform

much better than regularized linear regression models. This demonstrates

that the hierarchical Bayesian modeling technique can eciently handle a

large number of users and is used in a large-scale commercial system. More

details of the fast learning algorithm is beyond the scope of this chapter, and

we refer the reader to Zhang and Koren 2007 (74) for more information.

8.5 Novelty and Redundancy Detection

Although there is an extensive body of research on adaptive information

filtering, most of it is focused on identifying relevant documents. A common

Text Mining: Classification, Clustering, and Applications

Search WWH ::

Custom Search

Home