Exploiting Diversification in Gossip-Based Recommendation - Data Management in Cloud, Grid and P2P Systems - page 34

Database Reference

In-Depth Information

1

1

1

0.8

0.8

0.8

0.6

0.6

0.6

Diversified

Undiversified

0.4

0.4

0.4

0.2

0.2

0.2

0

0

0

Jaccard Cosine over big Overlap

Jaccard Cosine over big Overlap

Jaccard Cosine over big Overlap

(a) MovieLens

(b) Flickr

(c) LastFM

1

1

1

0.8

0.8

0.8

Usefulness (Jaccard)

xQuad (Jaccard)

MMR (Jaccard)

Jaccard

0.6

0.6

0.6

0.4

0.4

0.4

0.2

0.2

0.2

0

0

0

(d) MovieLens

(e) Flickr

(f) LastFM

Fig. 2. Effect on recall of diversification

Not surprisingly, diversifying the U-Net enables for all relevance score to sig-

nificantly increase recall. On the MovieLens dataset, the recall results without

diversification range between 0 . 58 and 0 . 62 while they range between 0 . 978 and

0 . 999 with diversification. On the Flickr dataset, the gains are slightly smaller.

Since all users share their own pictures, their profiles are very different and al-

ready diversified. Therefore, diversification has less impact on the recall. Finally,

the LastFM dataset recall results are up to 3.26 times higher.

In addition to improve the recall, diversified solutions enable to reduce the

variance compared to undiversified solutions. For instance, on Flickr ,thevari-

ance decreases from 0 . 116 to 0 . 013 when using Jaccard . This can be explained

by the fact that in the undiversified solution, users in U-Net are very similar

among them. As a consequence, either all are relevant to the query, and hence

they provide a high recall; or none of them is, thus producing a low recall. Diver-

sification enables to increase coverage and therefore, it increases the probability

to answer any kind of query.

In addition, we ran these experiments with different sizes of U-Net and values

of TTL . For instance, on the MovieLens dataset, with a U-Net of size 5 and a

TTL of 2, the recall is in average 2 . 37 times higher compared to undiversified

solutions. Indeed, without diversification, recall values are in average of 0 . 26

while they reach 0 , 61 using usefulness.

We have also compared three different diversification methods. The first is

the usefulness score presented in Equation 7. The second method we use is the

Maximal Marginal Relevance ,knownas MMR [13]. MMR chooses users that

minimize the maximum similarity between any two users in u 's U-Net . Finally,

the last method is Explicit Query Aspect Diversification known as xQuad [14].

xQuad chooses users such that each user v i in u 's U-Net is similar to u in a

different way. For instance, suppose that u shares items i 1 and i 2 .If v 1 is in u 's

U-Net and is similar to u because it also shares i 1 , then, xQuad chooses a user

v 2 such that v 2 is similar to u because it shares the item i 2 . In this experiment,

we use Jaccard as the similarity measure.

Next Page

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home