Databases Reference
In-Depth Information
FA-PR w. preferences
FA-PR w/o preferences FolkRank total
LocalRank
p-core 1
18,774
20,336
39,110
< 1
p-core 5
15,320
16,959
32,279
< 1
p-core 10
9,390
10,466
19,856
< 1
Table 3.2: Running times for recommendations in milliseconds.
the non-biased preference weights can be computed in advance and do not have to be re-computed for
each recommendation. When relying on this re-use the computation time for FolkRank can be cut by
about 50%.
Implementation and memory requirements. Similar to the implementation of FolkRank, our im-
plementation of LocalRank is memory-based, that is, all the required data is kept in memory. Actually,
the time needed for the calculation of a recommendation list is on average constantly below one millisec-
ond and does not increase when the size of the folksonomy increases. Beside the lower computational
complexity of the neighborhood-based LocalRank algorithm itself, the more or less constant access time
is made possible through a compact in-memory representation of the data and a pre-processing step at
startup. In the pre-processing step, simple statistics such as
, and the number of neighbors for
each user and resource are pre-computed. In addition, two adjacency lists are constructed that represent
the graph structure and are required for the weight propagation step: One stores the information which
user posted which tags, the other one contains information about the tags attached to each resource.
Once the pre-processing step is performed, the generation of recommendation lists at run-time is based
on simple arithmetical operations based on the data which are organized in lookup tables. Note that
when new data comes in, the lookup tables can be very quickly updated because only local changes in
the “neighborhood” of the newly added elements have to be made.
The required overhead in terms of additionally required memory is limited. For the simple counting
statistics (e.g., number of assignments per tag) 4 integer arrays with a total size of 2
|
Y u
|
,
|
Y r
|
∗|
U
|
+2
∗|
R
|
are
required. Two further hash maps are used to store the weights
of existing user/tag and
resource/tag combinations in |Y | . Finally, the two adjacency lists are of length |U| and |R| ,whereeach
list entry points to its assigned tags, the total number of which is
|
Y u,t |
and
|
Y r,t |
|
T
|
. Overall this means that
|
Y
|
pointers
to elements of T are required.
Comparison with other approaches. Based on our compact in-memory representation, even the
p-core 1 data set can be kept in memory. Note that for example in the work by [Gemmell et al., 2010]
“due to memory and time constraints” only a 10% fraction of a given Delicious data set is used. This
data set is by the way the largest one in their evaluation and with 700,000 tag assignments, which is
more than twenty times smaller than the p-core 1 data set used in our experiments. Note that for
even larger data sets, one additional implementation option for LocalRank would be to store the most
memory-intensive adjacency lists on disk in a (NoSQL) database. Typical database lookups with the given
hardware configuration and data volumes usually take a few milliseconds per query. The prototypical
implementation of a disk-based recommender for very large folksonomies is still an open issue in the field
of tag recommender systems.
Another work which reports prediction run times is [Rendle et al., 2009]. Here, Rendle et al. compare
the run times of their tensor factorization approach with FolkRank. After a linear time learning phase,
their algorithm makes predictions based only on the learned model. The needed prediction time depends
only on the relatively small number of factorization dimensions for users, resources, and tags as well as
the number of tags
. A characteristic of their method is that it achieves better accuracy results when
the model contains more dimensions (64 and 128) but is not accurate as FolkRank when the number
of dimensions is lower (e.g., 8 or 16). In their paper, a graphical illustration with no exact number of
running times is given. Running times range from nearly zero for the low-dimensional case up to about
10 or 15 milliseconds for the 64-factor model. Unfortunately, no numbers are given for the most accurate
128-dimensional model. While their implementation based on Object-Pascal very clearly outperforms
|
T
|
 
Search WWH ::




Custom Search