Trust in Online Collaborative IS - Community-Built Databases: Research and Development

Database Reference

In-Depth Information

with featured articles 9 is 17.8% (11.4% before being marked as featured and 25.4%

after being marked as featured), while it is about 9.9% for nonfeatured articles. In

general, a policy based on the assumptions in [ 53 ] would result in vandals having

more incentives to contribute to high-quality pages hoping to increase their reputa-

tions, and high reputation users having less incentives to contribute to low-quality

pages to improve their quality.

14.3.2.5 Population Coverage and Precision and Recall Issues

In related work, anonymous users are either completely ignored or assigned a static

reputation value, regardless of their behavior [ 37 ]. There are three main reasons

why we think that it is important to consider anonymous users in the reputation

estimation process: (1) About 33% of the submissions and 39% of the inserts in

Wikipedia are contributed by anonymous users and 16% of these contributions have

survived up to the last revisions of the articles; therefore they cannot be ignored;

(2) Wikipedia itself blocks IP addresses associated with anonymous vandals, and

40% of anonymous vandals are subject to infinite blocking. Therefore, an effective

reputation management system for Wikipedia should be able to identify anonymous

vandals; otherwise, a significant number of vandals will go undetected; and (3)

about 15% of data deleted from registered users is deleted by anonymous users;

hence ignoring their deletes would degrade the accuracy of the estimated reputation

for registered users.

To further verify the relevance of anonymous users, we reformulate Model 3

and assign a static reputation value to all anonymous users, as suggested in [ 37 , 42 ].

Several static reputation values were tested and the results for the new model

(Model 3 0 ) show that the AUC always drops, for instance, by 1% when the reputation

of all anonymous users is set to 0.1. These results indicate that ignoring the

anonymous population is likely to decrease the accuracy of a reputation model.

Evaluation results reported by Adler et al. [ 37 ] using a precision and recall

analysis also confirm this observation. To be more specific, in their work they use a

model to estimate reputation values up to time t and then estimate the precision and

recall after time t provided by low reputation users for short-lived text , which are

defined as follows:

l Short-lived text is text that is almost immediately removed (only 20% of the text

in a version survives to the next version).

l A low reputation author is an author whose reputation falls in the bottom 20% of

the reputation scale.

Table 14.4 shows the precision and recall values obtained on these data by Adler

et al. by first ignoring anonymous users (first row) and then by assigning a static

common reputation value to all anonymous users (second row). The third row

9 http://en.wikipedia.org/wiki/Featured_Article

Search WWH ::

Custom Search

Home