Database Reference
In-Depth Information
8.4 Collaborative Adaptive Filtering
One major challenge of building a recommendation or personalization sys-
tem is that the profile learned for a particular user is usually of low quality
when the amount of data from that particular user is small. This is known as
the “cold start” problem. This means that any new user must endure poor
initial performance until sucient feedback from that user is provided to learn
a reliable user profile.
There has been much research on improving classification accuracy when
the amount of labeled training data is small. The semi-supervised learning
approach combines unlabeled and labeled data together to achieve this goal.
Another approach is using domain knowledge. Researchers have modified
different learning algorithms, such as Na ıve-Bayes (33), logistic regression
(21), and SVMs (62), to integrate domain knowledge into a text classifier.
The third approach is borrowing training data from other resources (16) (21).
The effectiveness of these different approaches is mixed, due to how well the
underlying model assumption fits the data.
This section introduces one well-received approach to improve recommen-
dation system performance for a particular user: borrowing information from
other users through a Bayesian hierarchical modeling approach. Several re-
searchers have demonstrated that this approach effectively trades off between
shared and user-specific information, thus alleviating poor initial performance
for each user (76) (67) (74).
Assume there are M users in the adaptive filtering system. The task of
the system is to deliver documents that are relevant to each user. For each
user, the system learns a user model from the user's history. In the rest of
this section, the following notations are used to represent the variables in the
system.
m =1 , 2 , ..., M : The index for each individual user. M is the total number of
users.
w m : The user model parameter associated with user m . w m is a K dimen-
sional vector.
j =1 , 2 , ..., J m : The index for a set of data for user m . J m is the number of
training data for user m .
D m =
: A set of data associated with user m . x m,j is a K di-
mensional vector that represents the m th user's j th training document. 4
y m,j is a scalar that represents the label of document x m,j .
{
( x m,j ,y m,j )
}
k =1 , 2 , ..., K : The dimensional index of input variable x .
4 The first dimension of x is a dummy variable that always equals to 1.
 
Search WWH ::




Custom Search