Database Reference
In-Depth Information
form of classifiers, on the available “ground truth” associated with Wikipedia-
known administrators and vandals.
This general approach, which is fairly standard in machine learning applications,
requires some explanations. It is reasonable to assume that there exists a true
reputation function that is scaled between 0 and 1 and increases monotonically
from the user with the lowest reputation to the user with the highest reputation. Our
work is an attempt to approximate this unknown function. The only ground truth
available to us concerning this function comes in the form of two extreme datasets
of users: the vandals and the admins. No ground-truth data are available for
individuals in the middle range of the spectrum. Thus, to approximate the true
unknown reputation function, our first focus is on testing whether the proposed
models behave well on the two extreme populations. The models we propose have
very few free parameters and they are used to predict reputation values for large
numbers of admins and vandals. Once a model capable of producing an output
between 0 and 1 for each user has been shown to perform well on the two extreme
populations, it is also reasonable to ask whether it performs well on other users.
Since no ground truth is available for these users, only indirect evidence can be
provided regarding the corresponding performance of the model. Indirect, yet very
significant, evidence can be provided in a number of different ways including
assessment with respect to other models and datasets proposed in the relevant
literature, and results obtained on curated datasets that go beyond the available
admin/vandal data. These are precisely the kinds of analyses that are described in
the following sections.
In order to estimate users' reputations, we deconstruct edit actions into inserts
and deletes. We consider the stability of the inserts done by a user, the fraction of
inserts that remain, to be an estimate of his/her reputation. Although stability of
deletes can also be considered as another source of information, it has several
shortcomings. In fact, Wikipedia is more derived by inserts, and the size of inserts is
1.6 times larger than the size of deletes. Deletes are more difficult to track and
therefore calculating the stability of deletes is noisier and more computationally
extensive. Hence, we make an assumption that using only the stability of inserts
would result in a reliable estimation of users' reputation values.
Consider a user i who at time t inserts c i ( t ) tokens into a Wikipedia page. It is
reasonable to assume that the update R i ( t )of R i ( t ) should depend on the quality of
the tokens inserted at time t . To assess the quality of each token, let t 0 represent the
first time point, after t , where an administrator (hereafter referred to as “admin”)
checks the current status of a wiki page by submitting a new revision. According to
English Wikipedia history dumps, admins on average submit about 11% of the
revisions of the page, which are distributed over the life cycle of the page.
By definition (or approximation), a token inserted at time t is defined to be of
good quality if it is present after the intervention of the admin at time t 0 ; otherwise it
is considered to be of poor quality. Therefore, we have c i ( t )
g i ( t )+ p i ( t ), where
g i ( t ) (resp. p i ( t )) represents the number of good quality tokens (resp. poor quality).
For user i , we also let N i ( t ) be the total number of tokens inserted up to and right
before the time t and, similarly, let n i ( t ) be the number of good quality tokens
¼
Search WWH ::




Custom Search