Database Reference
In-Depth Information
14.3.2 Comparison to Related Work
In this section, we discuss our model in more detail and compare it to related work
in the literature according to several different criteria, appearing in boldface in the
criteria list below:
14.3.2.1 Tracking Token Ownership
Effective assignment of inserts and deletes to owners is highly dependent on (1) the
accuracy of the diff algorithm used for calculating the distance between two
revisions of a wiki page; and (2) the side effects of reverts resulting in incorrect
ownership assignments. An effective diff algorithm for wikis should identify
differences in a way that is meaningful to human readers. In particular, reordering
of text blocks should be detected in order to accurately assign ownership to the
tokens in the reordered blocks. This issue has not been taken into consideration in
some of the previous work [ 5 , 34 , 47 ]. For example, Sabel et al. [ 34 ] use the
Levenshtein algorithm 7 to compute the edit distance between two revisions. This
algorithm penalizes block reordering and as a result each token that has been shifted
is usually considered deleted from its old position and inserted in its new position
[ 48 , 49 ]. In our experience, the Wikipedia's diff algorithm can suffer from the same
problem, occasionally preventing the detection of block reorderings. We and others
[ 37 ] overcome this problem by using efficient diff algorithms that detect reordering
of blocks and run in time and space linear to the size of the input [ 50 , 51 ].
Another issue in accurate assignment of token ownership has to do with taking
into account the side effects of reverts. In general, successive revisions of a wiki
page have similar content, and each revision, except the very first, is a descendant of
the preceding one. However, this model is insufficient for describing the realistic
evolution of a wiki page [ 34 ]. Assume that a vandal blanks out the i th revision of a
wiki page. Therefore, the ( i + 1)th revision becomes blank. When user u reverts the
( i + 1)th revision to the previous revision, this revert results in a new revision and
the content of ( i + 2)th revision and i th revision become the same. This scenario
raises several problems: (1) users whose contributions were deleted by the vandal
are penalized unfairly; (2) u is erroneously considered to be the owner of all the
content of the ( i + 2)th revision; and (3) the true original owner(s) are denied
ownership of the content they actually contributed. We and others [ 37 ] address
this issue by ignoring these spurious insertions and deletions caused by reverts.
However, in [ 37 ], the authors decided to process only up to the third successive
revision in order to extract reverts and assign ownership. Our study of Wikipedia
shows that about 6% of reverts return the i th revision of a page to the j th, where
i
j
>
3. For this reason, in order not to lose any information, we process all
7 http://en.wikipedia.org/wiki/Levenshtein_distance
Search WWH ::




Custom Search