Database Reference
In-Depth Information
Table 14.6 Properties of the
dataset
Time span
96 months
Number of users
12,797,391
Registered users
1,749,146
Anonymous users
11,048,245
Number of articles
1,899,622
Featured
2,650
Good users
197,436
Good
7,502
Good users
334,369
For deletion
125
Regular
1,889,345
Number of revisions
123,938,034
By anonymous users
82,577,828
By registered users
41,360,206
defines two classes of users. An anonymous user is a user who is known only
through his/her IP address. A registered user is a user associated with his/her
usernames (i.e., nicknames) that were entered during the registration process. We,
as well as others [ 23 , 42 , 54 ], follow the same nomenclature as Wikipedia: a user in
this study refers to a registered account or an IP address, and it does not refer to a
real-world individual.
14.5.1 Extracting Reverts
A revert is an action to undo all changes made to an article and is primarily used for
fighting vandalism. To extract reverts, we compare the text of each revision to the
text of the previous revisions. Since the text comparison process is computationally
expensive, the comparison is done on the MD5 signature of the texts rather than on
the texts themselves.
14.5.2 Extracting Events
We consider an atomic event to be an insertion or deletion of a word. Insertions are
extracted by comparing the text of each revision with the text of the previous
revision; deletions are extracted by comparing the text in a revision with the text of
all the subsequent revisions. We use the diff algorithm described in [ 50 ], for
accurate extraction of atomic events. The advantage of this algorithm compared
to most of the current diff algorithms is its ability to detect movements of blocks.
The developed tool, named Wikipedia Event Extractor , is publicly available at [ 57 ].
We calculated R i ( T ) of users by processing the extracted events.
Search WWH ::




Custom Search