Database Reference
In-Depth Information
dynamics in the evolution of the content in these articles that allows only very high-
quality content to survive. Note that in order to control the increased visibility and
attention that articles might gain after being marked as featured, we have also
reported the results of our analysis both before and after becoming featured.
Featured articles can also be distinguished from other articles in terms of
proportion of reverted revisions. While, on average, 9.9% of the revisions in
nonfeatured articles are reverted, they become 25.4% after an article becomes
featured. This significant increase in the ratio of reversions after articles are marked
as featured is a matter of further study; this can be due to more vandalism as a
consequence of higher visibility or it might be attributed to the fact that most of the
featured articles have become mature and thus more resistant to change.
In summary, we conclude that (a) featured articles are more closely followed:
although less than 0.08% of the articles are marked as featured, they comprise about
1.4% of the total number of revisions; (b) Wikipedia administrators contribute more
actively to featured articles even before these articles are marked as featured; (c) the
revert ratio in featured articles is about 1.8 times higher than the ratio for non-
featured articles; and (d) featured articles have a much higher turnover of content.
This higher dynamic in the article's evolution allows very high-quality content to
survive. It is interesting to note that even at this lower survival rate, featured articles
are on average longer than other articles [ 38 ]. Overall, these statistics support the
view that featured articles benefit from a higher degree of supervision as compared
to other articles.
14.5 Tools and Methods
In order to obtain the data for our study, we used five client machines for a period of
2.5 months during summer 2009 to send requests to MediaWiki API and extract the
data. By sending consecutive requests to MediaWiki API, one can obtain the text of
all revisions of each Wikipedia article. We needed the list of the articles in English
Wikipedia to feed to the API in order to obtain article revisions. However, a
significant number of Wikipedia articles had been redirected to other articles so
we ignored them. In order to obtain a clean list of Wikipedia articles, we used
crawler4j [ 56 ] to crawl English Wikipedia and extract the list of nonredirected
articles. We started from the Wikipedia main page and some other seed pages and
by traversing the links we crawled about 1.9 million articles. We also used the
MediaWiki API to extract different types of contributors such as bots , 12 admins ,
and blocked users. Table 14.6 shows the properties of the dataset.
A note about “users.” It is virtually impossible to associate actual persons with
the internet behavior in a one-to-one fashion. To bypass this problem, Wikipedia
12 Bots are generally programs or scripts that make automated edits without the necessity of human
decision-making.
Search WWH ::




Custom Search