Database Reference
In-Depth Information
with featured articles
9
is 17.8% (11.4% before being marked as featured and 25.4%
after being marked as featured), while it is about 9.9% for nonfeatured articles. In
general, a policy based on the assumptions in [
53
] would result in vandals having
more incentives to contribute to high-quality pages hoping to increase their reputa-
tions, and high reputation users having less incentives to contribute to low-quality
pages to improve their quality.
14.3.2.5 Population Coverage and Precision and Recall Issues
In related work, anonymous users are either completely ignored or assigned a
static
reputation value, regardless of their behavior [
37
]. There are three main reasons
why we think that it is important to consider anonymous users in the reputation
estimation process: (1) About 33% of the submissions and 39% of the inserts in
Wikipedia are contributed by anonymous users and 16% of these contributions have
survived up to the last revisions of the articles; therefore they cannot be ignored;
(2) Wikipedia itself blocks IP addresses associated with anonymous vandals, and
40% of anonymous vandals are subject to infinite blocking. Therefore, an effective
reputation management system for Wikipedia should be able to identify anonymous
vandals; otherwise, a significant number of vandals will go undetected; and (3)
about 15% of data deleted from registered users is deleted by anonymous users;
hence ignoring their deletes would degrade the accuracy of the estimated reputation
for registered users.
To further verify the relevance of anonymous users, we reformulate Model 3
and assign a static reputation value to all anonymous users, as suggested in [
37
,
42
].
Several static reputation values were tested and the results for the new model
(Model 3
0
) show that the AUC always drops, for instance, by 1% when the reputation
of all anonymous users is set to 0.1. These results indicate that ignoring the
anonymous population is likely to decrease the accuracy of a reputation model.
Evaluation results reported by Adler et al. [
37
] using a precision and recall
analysis also confirm this observation. To be more specific, in their work they use a
model to estimate reputation values up to time
t
and then estimate the precision and
recall after time
t
provided by
low reputation users
for
short-lived text
, which are
defined as follows:
l
Short-lived text is text that is almost immediately removed (only 20% of the text
in a version survives to the next version).
l
A low reputation author is an author whose reputation falls in the bottom 20% of
the reputation scale.
Table
14.4
shows the precision and recall values obtained on these data by Adler
et al. by first ignoring anonymous users (first row) and then by assigning a static
common reputation value to all anonymous users (second row). The third row
9
http://en.wikipedia.org/wiki/Featured_Article
Search WWH ::
Custom Search