Database Reference
In-Depth Information
with featured articles 9 is 17.8% (11.4% before being marked as featured and 25.4%
after being marked as featured), while it is about 9.9% for nonfeatured articles. In
general, a policy based on the assumptions in [ 53 ] would result in vandals having
more incentives to contribute to high-quality pages hoping to increase their reputa-
tions, and high reputation users having less incentives to contribute to low-quality
pages to improve their quality.
14.3.2.5 Population Coverage and Precision and Recall Issues
In related work, anonymous users are either completely ignored or assigned a static
reputation value, regardless of their behavior [ 37 ]. There are three main reasons
why we think that it is important to consider anonymous users in the reputation
estimation process: (1) About 33% of the submissions and 39% of the inserts in
Wikipedia are contributed by anonymous users and 16% of these contributions have
survived up to the last revisions of the articles; therefore they cannot be ignored;
(2) Wikipedia itself blocks IP addresses associated with anonymous vandals, and
40% of anonymous vandals are subject to infinite blocking. Therefore, an effective
reputation management system for Wikipedia should be able to identify anonymous
vandals; otherwise, a significant number of vandals will go undetected; and (3)
about 15% of data deleted from registered users is deleted by anonymous users;
hence ignoring their deletes would degrade the accuracy of the estimated reputation
for registered users.
To further verify the relevance of anonymous users, we reformulate Model 3
and assign a static reputation value to all anonymous users, as suggested in [ 37 , 42 ].
Several static reputation values were tested and the results for the new model
(Model 3 0 ) show that the AUC always drops, for instance, by 1% when the reputation
of all anonymous users is set to 0.1. These results indicate that ignoring the
anonymous population is likely to decrease the accuracy of a reputation model.
Evaluation results reported by Adler et al. [ 37 ] using a precision and recall
analysis also confirm this observation. To be more specific, in their work they use a
model to estimate reputation values up to time t and then estimate the precision and
recall after time t provided by low reputation users for short-lived text , which are
defined as follows:
l Short-lived text is text that is almost immediately removed (only 20% of the text
in a version survives to the next version).
l A low reputation author is an author whose reputation falls in the bottom 20% of
the reputation scale.
Table 14.4 shows the precision and recall values obtained on these data by Adler
et al. by first ignoring anonymous users (first row) and then by assigning a static
common reputation value to all anonymous users (second row). The third row
9 http://en.wikipedia.org/wiki/Featured_Article
Search WWH ::




Custom Search