Information Technology Reference
In-Depth Information
Table 6.1 Levels of sparsity for a selection of well-known data sets
Data set
Sparsity
Proportion of interactions
References
Netflix prize challenge
0.98842593
86.4
[ 7 ]
Book-crossings
0.99998546
68796.6
[ 62 ]
Movielens 100k
0.95840128
15.9
[ 26 ]
Movielens 1M
0.98691797
23.9
[ 26 ]
Movielens 10M
0.98827612
76.4
[ 26 ]
EachMovie
0.97631161
42.2
[ 58 ]
Jester
0.43662440
1.8
[ 58 ]
Y!Music
0.99915117
1178.8
[ 20 ]
News Portal 1
0.99998499
66622.8
News Portal 2
0.99996663
2996.8
6.4.2 Popularity
We encounter popularity as some items comprise a considerably larger fraction of
interactions compared to others. Previous work has documented the occurrence of a
popularity bias in a variety of domains. These domains include movies, songs, and
books. We have grown accustomed to call popular items with specialized names.
“Blockbuster”, “hit”, and “bestseller” refer to such popular movies, songs, and books.
Recommender systems consider these type of items as adequate suggestions. We
expect visitors to accept suggestions of popular items. The acceptance holds as users'
tastes do not deviate from the majority of users. On the other hand, users may already
be aware of the items. In such cases, the suggestion lacks serendipity. We discover
popularity biases as we analyze the distribution of interactions over items. Popularity
10^2.0
10^4
10^1.5
10^3
10^1.0
10^2
10^0.5
10^1
10^0
10^0.0
10^0 10^1 10^2 10^3 10^4 10^5 10^6
10^0
10^1
10^2
10^3
10^4
Interactions
Interactions
Fig. 6.3 Popularity distribution for a news portal ( left ) and the Movielens ( right ) data set
 
Search WWH ::




Custom Search