Adaptive Information Filtering - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

FIGURE 8.2 : Illustration of dependencies of variables in the hierarchical

model. The rating, y ,foradocument, x , is conditioned on the document

and the user model, w m , associated with the user m . Users share information

about their models through the prior, Φ = ( μ, Σ).

The Bayesian hierarchical modeling approach has been widely used in real-

world information retrieval applications. Generalized Bayesian hierarchical

linear models, a simple set of Bayesian hierarchical models, are commonly

used and have achieved good performance on collaborative filtering (67) and

content-based adaptive filtering (76) (74) tasks. Figure 8.2 shows the graph-

ical representation of a Bayesian hierarchical model. In this graph, each user

model is represented by a random vector w m . Assume a user model is sam-

pled randomly from a prior distribution P ( w| Φ). The system can predict the

user label y of a document x given an estimation of w m (or w m 's distribution)

using a function y = f ( x, w ). The model is called generalized Bayesian hier-

archical linear model when y = f ( w T x ) is any generalized linear model such

as logistic regression, SVM, and linear regression. To reliably estimate the

user model w m , the system can borrow information from other users through

the prior Φ = ( μ, Σ).

Now we look at one commonly used model where y = w T x + ,where

N (0 ,σ 2 ) is a random noise (67) (76). Assume that each user model

w m is an independent draw from a population distribution P ( w

∼

Φ), which is

governed by some unknown hyperparameter Φ. Let the prior distribution of

user model w be a Gaussian distribution with parameter Φ = ( μ, Σ), which

is the commonly used prior for linear models. μ =( μ 1 ,μ 2 , ..., μ K )isa K

dimensional vector that represents the mean of the Gaussian distribution, and

Σ is the covariance matrix of the Gaussian. Usually, a Normal distribution

N (0 ,aI ) and an Inverse Wishart distribution P (Σ)

|

| − 2 b exp(

2 c tr(Σ − 1 ))

are used as hyperprior to model the prior distribution of μ and Σ respectively.

1

∝|

Σ

−

Search WWH ::

Custom Search

Home