Exploiting Diversification in Gossip-Based Recommendation - Data Management in Cloud, Grid and P2P Systems

Database Reference

In-Depth Information

are often willing to share their data with other users in a community of interest.

However, the fact that their data spaces are distributed in many different systems

makes data sharing especially dicult. For instance, an artist photographer who

wants to share her pictures within an online community of photographers may

have to log in several different Web applications such as deviantArt, Facebook

or Flickr, each with a different interface and account. Similarly, a scientist who

needs to search for scientific datasets within an online community of scientists

will be faced with the problem that the relevant data is typically distributed

in many different labs' servers or scientists' local computers. Furthermore, since

this data is hidden to web crawlers, traditional search engines become useless.

In order to mitigate this problem, some Web applications allow grouping several

accounts and data from different systems ( e.g. Facebook enables to regroup

DropBox and blogs into a single Facebook account). However, they are limited

to a few well-known systems.

In this context of large scale distribution of users and data, a general solution

to data sharing is offered by distributed search and recommendation [1, 2]. In

this paper, we adopt a peer-to-peer gossip-based approach, because it provides

important properties such as scalability, dynamicity, autonomy and decentralized

control. Within an online community, each user u is associated to a virtual data

space that contains all the data items (stored in different systems) it shares.

Given u and a keyword query q , the goal of our search and recommendation

approach is to recommend to u items that are relevant with respect to q and

that are shared by other users, regardless of the systems that store the items.

Then, a recommended item is simply a reference that can be used to retrieve

the actual data item. In other words, we combine search and recommendation

in the sense that a user u searches relevant items among those recommended by

users similar to u .

Distributed search and recommendation has received considerable attention

[1-4]. However, one open problem is the ability to attain high recall results. A

query is generally forwarded only to a subset of users who will be employed to

process queries and return recommendations. To compute this subset of users,

many solutions cluster relevant user profiles implicitly using gossip protocols.

Gossip protocols are known to be highly resilient, scalable and converge quickly

[5], which makes them a good alternative for distributed search and recommen-

dation. A User Network ( U-Net in the following) refers to the cluster of relevant

users, a user u is aware of by gossiping, using a score ( e.g. similarity between u

and the users in U-Net ). At each gossip round, the most relevant users are kept

in U-Net .Since U-Net is used to guide recommendations given a keyword query,

the relevance score used in the clustering process plays a very important role to

increase the number of relevant items retrieved with respect to the whole set of

items ( i.e. recall), known as the global corpus.

Relevance scores ( e.g. Jaccard, overlap) define how well a user profile v meets

the needs of another user u . Most of the existing solutions exploit different kinds

of relevance scores to increase recall [2-4, 6, 7]. But recall results remain limited.

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home