Improving recommendation accuracy based on item-specific tag preferences - Recommender Systems and the Social Web

Databases Reference

In-Depth Information

In the movie-ratings algorithm, the prediction r u,t of the general interest of a user u in the concept

represented in a tag t , that is, the tag preference, is estimated as follows:

r u,t = m ∈ I t w ( m, t )

r u,m

m∈I t w ( m, t )

∗

(4.2)

In this equation, I t corresponds to the set of all items tagged with t . The explicit overall rating that

u has given to movie m is denoted as r u,m . The general idea of the method is thus to propagate the

overall rating value to the tags of a movie according to their importance.

In this work, however, we are interested in predicting the rating for a tag in the context of the target

user u and the target item i . Note that the rating prediction in Equation (4.2) does not depend on the

target item i at all. Our tag prediction function, r u,i,t calculates a prediction for the target tag t ,given

user u and item i , as follows:

r u,i,t = m ∈ similarItems ( i,I t ,k ) w ( m, t )

r u,m

m∈similarItems ( i,I t ,k ) w ( m, t )

∗

(4.3)

Instead of considering all items that received a certain tag as done in [Sen et al., 2009b], we only

consider items that are similar to the item at hand, thereby avoiding the averaging effect of “global” cal-

culations. In Equation (4.3), the calculation of neighboring items is contained in the similarItems ( i, I t ,k )

function which returns the k most similar items to i from I t .

The similarity of items is measured with the adjusted cosine similarity metric. Note that we also

ran experiments using the Pearson correlation coecient as a similarity metric, which, however, led to

poorer results. As another algorithmic variant, we have tried to factor in the item similarity values as

additional weights in Equation (4.3). Again, this did not lead to further performance improvements but

rather worsened the results.

Note that when using the user's explicit overall rating r u,m as in Equation (4.2), no prediction can be

made for the tag preference if user u did not rate any item m tagged with t , i.e., if I t ∩ratedItems ( u )= ∅ .

In our previous work [Gedikli and Jannach, 2010c], we therefore also incorporated the recursive prediction

strategy from [Zhang and Pu, 2007] into the tag preference prediction process, which lead to a slight

performance improvement. Since such an performance improvement was, however, also observed for Sen

et al.'s original methods, we will not further discuss these generally-applicable technique here in greater

depth, see [Gedikli and Jannach, 2010c] for details of the evaluation.

4.4 Predicting item ratings from tag preferences

In [Sen et al., 2009b], the best-performing tag-based recommendation algorithm with respect to precision

is a hybrid which combines the SVM-based method regress-tag and the tag-agnostic matrix factoriza-

tion approach funk-svd [Funk, 2006]. In this work, we therefore propose to parameterize and evaluate

the regress-tag method using item-specific tag preferences. Note again that these tag preferences can

be explicitly available or derived as described above in Equation (4.3). In addition to that, we report

accuracy results when using item-specific tag preferences for Sen et al's cosine-tag method, in order to

study how this approach, which we proposed in our previous work [Gedikli and Jannach, 2010c], performs

on additional data sets.

The regress-tag algorithm. The regress-tag method from [Sen et al., 2009b] is based on deter-

mining linear equations - one for each movie - which capture the possibly complex relationship between

the user's tag preferences for the tags of a given item and the overall item rating. The prediction function

for a user u and an item i is defined as follows, where h 0 to h n are the coecients of the linear equations

and r u,t i from Equation (4.2) corresponds to the estimated tag preferences for the tags t 1 ,..., t n attached

to item i :

(4.4)

In [Sen et al., 2009b], the coe cients h 0 to h n were chosen with the help of regression support vector

machines and the libsvm library [Chang and Lin, 2011] because this led to the most accurate results when

compared with other methods for choosing the parameters such as least-squares optimization.

regress - tag ( u,i )= h 0 + h 1 ∗

r u,t 1 + ... + h n

∗

r u,t n

Recommender Systems and the Social Web

Search WWH ::

Custom Search

Home