Database Reference
In-Depth Information
Fig. 14.1 TPRs and FPRs for the three reputation models as the classification threshold is
decreased from 1 to 0
sock-puppetry, 5 edit war, 6 advertising, or edit disruption. At the other end of the
spectrum, automatic extraction of good users beyond admins is not a trivial task.
To identify a set of good users, we focus on Wikipedia articles that are marked as
good or featured by a committee of experts. From the pool of users contributing to
these articles, we extract those who still have contributions that are live in the most
recent revisions of these articles. Our definition of good users is also consistent with
the result of a recent study of Wikipedia [ 5 ], which shows that identification of top
page contributors is most highly correlated with the count of their contributed
sentences that have survived up to the most recent revision of the wiki pages.
Table 14.1 shows the AUC values for this extended classification experiment.
Similar to the previous results, all the three models perform well and their classifi-
cation performances are comparable; however, looking at TPRs (True Positive
Rates) and FPRs (False Positive Rates) separately (Fig. 14.1 ) reveals some subtle
differences. In particular, we can see that Model 1 is the best model for detecting
vandals/blocked users (lower FPR), while Model 3 is the best model for detecting
admin/good users (higher TPR).
Table 14.2 compares the mean and standard deviation of the reputation values
for good users and admins against blocked users. In general, all three models assign
high reputation values to admins/good users and low reputation values to blocked
users, but the distribution of assigned reputations (Fig. 14.2 ) confirms that Model 1
5
http://en.wikipedia.org/wiki/Wikipedia:Sock_puppetry
6 http://en.wikipedia.org/wiki/Edit_warring
Search WWH ::




Custom Search