Database Reference
In-Depth Information
inserted up to and right before the time t . Using a “+” superscript to denote
values immediately after the time t , we have N i
and n i
ð
t
Þ¼
N i ð
t
Þþ
c i ð
t
Þ
ð
t
Þ¼
n i ð
t
.
We can now define three different models of reputation.
Þþ
g i ð
t
Þ
Model 1: In the first model, reputation is simply measured by the fraction of good
tokens inserted. In this model, we simply have
n i
ð
t
Þ
n i ð
t
Þþ
g i ð
t
Þ
R i
ð
t
Þ¼
Þ ¼
(14.1)
N i
N i
ð
t
ð
t
Þþ
c i ð
t
Þ
Model 2: While the first model appears reasonable, tokens that are deleted are
treated uniformly. In reality, there is some information to be found at the time at
which deletions occur. Vandalistic insertions, for instance, tend to be removed very
rapidly [ 23 , 45 , 46 ]. According to our study on Wikipedia, 76% of vandalism is
reverted in the very next revision.
Insertions that are deleted only after a very long period of time tend to be deleted
because they are outdated rather than poor in quality. Thus, in general, we arrive
at the hypothesis that the quicker a token is deleted, the more likely it is to be of
poor quality. To realize this hypothesis, we propose a variation on Model 1, where
delete tokens introduce a penalty in the numerator with an exponential time decay
controlled by a single parameter a .
Þ P p i ðtÞ
1 e a ðt d
n i ð
t
Þþ
g i ð
t
R i
ð
t
Þ¼
(14.2)
N i
ð
t
Þþ
c i ð
t
Þ
Here, t d represents the time at which the corresponding token was deleted. Since
update rate can vary among different wiki pages, we consider the time interval in
terms of the number of revisions. We trained R i
in
order to maximize the area under the ROC curve (AUC). The result shows that
a ¼
ð
t
Þ
over different values of
a
0.1 returns the best result.
Model 3: This model is a variation of Model 2, where we also take into account the
reputation of the deleter and use his/her reputation to weigh the corresponding
deletion in the form
Þ P p i ðtÞ
d
e a ðt d
n i ð
t
Þþ
g i ð
t
1 R jðt d Þ ð
t d Þ
¼
R i
ð
t
Þ¼
(14.3)
N i
ð
t
Þþ
c i ð
t
Þ
The idea behind this variation of the model is to value the deletions performed by
high reputation users (e.g., admins) and devalue the deletions performed by low
reputation users (e.g., vandals). In Model 3,
0.08 for the maximum (AUC).
For users who start with a delete action, we need to know the initial value, R i (0).
If we denote T the final time, experiments show that the fastest convergence from
a ¼
Search WWH ::




Custom Search