Trust in Online Collaborative IS - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

form of classifiers, on the available “ground truth” associated with Wikipedia-

known administrators and vandals.

This general approach, which is fairly standard in machine learning applications,

requires some explanations. It is reasonable to assume that there exists a true

reputation function that is scaled between 0 and 1 and increases monotonically

from the user with the lowest reputation to the user with the highest reputation. Our

work is an attempt to approximate this unknown function. The only ground truth

available to us concerning this function comes in the form of two extreme datasets

of users: the vandals and the admins. No ground-truth data are available for

individuals in the middle range of the spectrum. Thus, to approximate the true

unknown reputation function, our first focus is on testing whether the proposed

models behave well on the two extreme populations. The models we propose have

very few free parameters and they are used to predict reputation values for large

numbers of admins and vandals. Once a model capable of producing an output

between 0 and 1 for each user has been shown to perform well on the two extreme

populations, it is also reasonable to ask whether it performs well on other users.

Since no ground truth is available for these users, only indirect evidence can be

provided regarding the corresponding performance of the model. Indirect, yet very

significant, evidence can be provided in a number of different ways including

assessment with respect to other models and datasets proposed in the relevant

literature, and results obtained on curated datasets that go beyond the available

admin/vandal data. These are precisely the kinds of analyses that are described in

the following sections.

In order to estimate users' reputations, we deconstruct edit actions into inserts

and deletes. We consider the stability of the inserts done by a user, the fraction of

inserts that remain, to be an estimate of his/her reputation. Although stability of

deletes can also be considered as another source of information, it has several

shortcomings. In fact, Wikipedia is more derived by inserts, and the size of inserts is

1.6 times larger than the size of deletes. Deletes are more difficult to track and

therefore calculating the stability of deletes is noisier and more computationally

extensive. Hence, we make an assumption that using only the stability of inserts

would result in a reliable estimation of users' reputation values.

Consider a user i who at time t inserts c i ( t ) tokens into a Wikipedia page. It is

reasonable to assume that the update R i ( t )of R i ( t ) should depend on the quality of

the tokens inserted at time t . To assess the quality of each token, let t 0 represent the

first time point, after t , where an administrator (hereafter referred to as “admin”)

checks the current status of a wiki page by submitting a new revision. According to

English Wikipedia history dumps, admins on average submit about 11% of the

revisions of the page, which are distributed over the life cycle of the page.

By definition (or approximation), a token inserted at time t is defined to be of

good quality if it is present after the intervention of the admin at time t 0 ; otherwise it

is considered to be of poor quality. Therefore, we have c i ( t )

g i ( t )+ p i ( t ), where

g i ( t ) (resp. p i ( t )) represents the number of good quality tokens (resp. poor quality).

For user i , we also let N i ( t ) be the total number of tokens inserted up to and right

before the time t and, similarly, let n i ( t ) be the number of good quality tokens

¼

Community-Built Databases: Research and Development

Search WWH ::

Custom Search

Home