Digital Signal Processing Reference
In-Depth Information
Table 10.9
Metacritic database statistics
[
#
]
Minimum
Maximum
Average
Standard deviation
Reviews per film
1
65
21.1
10.3
Words
1
104
24.2
12.4
Sentences
1
13
1.3
0.6
ning it contained 752 negative, and 1 301 positive reviews from the Usenet news-
group rec.arts.movies.reviews . Later, other versions were added. 7 The Metacritic
film review corpus introduced in the following is larger by far: It comprises a total of
102 622 reviews for 4 901 films. Metacritic 8 compiles reviews for films, video/DVDs,
books, music, television series, and computer games from various sources.
Reviews in Metacritic are contained as excerpts of the 'key statement' from the
original reviews. Overall, 133 394 sentences, and 2 482 605 words are contained in
the database that will simply be referred to as Metacritic database in the ongoing.
The average review has 1.3 sentences, with a standard deviation of 0.6. In contrast
to other film review database, the reviews thus are short at an average length of
24.2 words (cf. Table 10.9 ). Its vocabulary comprises 83 328 words. By POS classes,
nouns (683 259) come first, followed by verbs (382 822), adjectives (244 825), and
adverbs (174 152).
Besides its sheer size, the database features fine-grained score values from 0 to 100
(the higher, the more positive) per review as particular highlight, calculated from the
original numeric rating scheme used by each source. These can be assumed reliable
in the sense of a ground truth rather than mere gold standard, given that they were
assigned by the authors of the reviews. An exception are the cases where no numeric
rating by the authors is available—in this case a Metacritic staff member provided
these. Further, from the reader's point of view, sentiment expressed can be perceived
differently [ 120 , 126 ]. ConceptNet tries to overcome the problem by letting users
vote on the reliability of predicates, which could be used in future approaches.
Metacritic itself provides an additional ternary mapping as can be seen in
Table 10.10 . There is no balance of instances per class (cf. Table 10.11 ): Roughly
three times as many positive than negative reviews are contained. A partitioning for
training and testing is realised by year leading to almost equal size: 49 698 instances
are contained in the 'odd' year set, and 52 924 in the 'even' year set.
Table 10.10 Metacritic's
mapping of score to valence
classes
Score
Valence class
# Reviews
81-100
Positive
15 353
61-80
Positive
38 766
40-60
Mixed
32 586
20-39
Negative
13 194
0-19
Negative
2 723
7
http://www.cs.cornell.edu/people/pabo/movie-review-data/
8
http://www.metacritic.com , accessed January 2009.
 
Search WWH ::




Custom Search