LocalRank – A graph-based tag recommender - Recommender Systems and the Social Web

Databases Reference

In-Depth Information

FA-PR w. preferences

FA-PR w/o preferences FolkRank total

LocalRank

p-core 1

18,774

20,336

39,110

< 1

p-core 5

15,320

16,959

32,279

< 1

p-core 10

9,390

10,466

19,856

< 1

Table 3.2: Running times for recommendations in milliseconds.

the non-biased preference weights can be computed in advance and do not have to be re-computed for

each recommendation. When relying on this re-use the computation time for FolkRank can be cut by

about 50%.

Implementation and memory requirements. Similar to the implementation of FolkRank, our im-

plementation of LocalRank is memory-based, that is, all the required data is kept in memory. Actually,

the time needed for the calculation of a recommendation list is on average constantly below one millisec-

ond and does not increase when the size of the folksonomy increases. Beside the lower computational

complexity of the neighborhood-based LocalRank algorithm itself, the more or less constant access time

is made possible through a compact in-memory representation of the data and a pre-processing step at

startup. In the pre-processing step, simple statistics such as

, and the number of neighbors for

each user and resource are pre-computed. In addition, two adjacency lists are constructed that represent

the graph structure and are required for the weight propagation step: One stores the information which

user posted which tags, the other one contains information about the tags attached to each resource.

Once the pre-processing step is performed, the generation of recommendation lists at run-time is based

on simple arithmetical operations based on the data which are organized in lookup tables. Note that

when new data comes in, the lookup tables can be very quickly updated because only local changes in

the “neighborhood” of the newly added elements have to be made.

The required overhead in terms of additionally required memory is limited. For the simple counting

statistics (e.g., number of assignments per tag) 4 integer arrays with a total size of 2

Y u

Y r

∗|

are

required. Two further hash maps are used to store the weights

of existing user/tag and

resource/tag combinations in |Y | . Finally, the two adjacency lists are of length |U| and |R| ,whereeach

list entry points to its assigned tags, the total number of which is

Y u,t |

and

Y r,t |

. Overall this means that

pointers

to elements of T are required.

Comparison with other approaches. Based on our compact in-memory representation, even the

p-core 1 data set can be kept in memory. Note that for example in the work by [Gemmell et al., 2010]

“due to memory and time constraints” only a 10% fraction of a given Delicious data set is used. This

data set is by the way the largest one in their evaluation and with 700,000 tag assignments, which is

more than twenty times smaller than the p-core 1 data set used in our experiments. Note that for

even larger data sets, one additional implementation option for LocalRank would be to store the most

memory-intensive adjacency lists on disk in a (NoSQL) database. Typical database lookups with the given

hardware configuration and data volumes usually take a few milliseconds per query. The prototypical

implementation of a disk-based recommender for very large folksonomies is still an open issue in the field

of tag recommender systems.

Another work which reports prediction run times is [Rendle et al., 2009]. Here, Rendle et al. compare

the run times of their tensor factorization approach with FolkRank. After a linear time learning phase,

their algorithm makes predictions based only on the learned model. The needed prediction time depends

only on the relatively small number of factorization dimensions for users, resources, and tags as well as

the number of tags

. A characteristic of their method is that it achieves better accuracy results when

the model contains more dimensions (64 and 128) but is not accurate as FolkRank when the number

of dimensions is lower (e.g., 8 or 16). In their paper, a graphical illustration with no exact number of

running times is given. Running times range from nearly zero for the low-dimensional case up to about

10 or 15 milliseconds for the 64-factor model. Unfortunately, no numbers are given for the most accurate

128-dimensional model. While their implementation based on Object-Pascal very clearly outperforms

Recommender Systems and the Social Web

Search WWH ::

Custom Search

Home