Adaptive Information Filtering - Text Mining: Classification, Clustering, and Applications

Database Reference

In-Depth Information

8.4 Collaborative Adaptive Filtering

One major challenge of building a recommendation or personalization sys-

tem is that the profile learned for a particular user is usually of low quality

when the amount of data from that particular user is small. This is known as

the “cold start” problem. This means that any new user must endure poor

initial performance until sucient feedback from that user is provided to learn

a reliable user profile.

There has been much research on improving classification accuracy when

the amount of labeled training data is small. The semi-supervised learning

approach combines unlabeled and labeled data together to achieve this goal.

Another approach is using domain knowledge. Researchers have modified

different learning algorithms, such as Na ıve-Bayes (33), logistic regression

(21), and SVMs (62), to integrate domain knowledge into a text classifier.

The third approach is borrowing training data from other resources (16) (21).

The effectiveness of these different approaches is mixed, due to how well the

underlying model assumption fits the data.

This section introduces one well-received approach to improve recommen-

dation system performance for a particular user: borrowing information from

other users through a Bayesian hierarchical modeling approach. Several re-

searchers have demonstrated that this approach effectively trades off between

shared and user-specific information, thus alleviating poor initial performance

for each user (76) (67) (74).

Assume there are M users in the adaptive filtering system. The task of

the system is to deliver documents that are relevant to each user. For each

user, the system learns a user model from the user's history. In the rest of

this section, the following notations are used to represent the variables in the

system.

m =1 , 2 , ..., M : The index for each individual user. M is the total number of

users.

w m : The user model parameter associated with user m . w m is a K dimen-

sional vector.

j =1 , 2 , ..., J m : The index for a set of data for user m . J m is the number of

training data for user m .

D m =

: A set of data associated with user m . x m,j is a K di-

mensional vector that represents the m th user's j th training document. 4

y m,j is a scalar that represents the label of document x m,j .

{

( x m,j ,y m,j )

}

k =1 , 2 , ..., K : The dimensional index of input variable x .

4 The first dimension of x is a dummy variable that always equals to 1.

Search WWH ::

Custom Search

Home