Information Technology Reference
In-Depth Information
Fig. 6
Emails: 2002-02-20a.html and 2002-07-04b.html
4.2 Plagiarism Papers
Plagiarism in academic articles is a well-known issue. The widespread use
of computers and the Internet has made it easier to plagiarize the work of
others. Most cases of plagiarism are found in academia, where documents are
typically scientific papers, essays or reports [27]. Our experiments show that
the
KFP
method can be used to detect the plagiarism very eciently.
In this case study our methodology involved the acquisition of a well-known
plagiarised paper [28] (named
Paper-1A
) on the independence number of a
graph and its corresponding original paper (named
Paper-1B
). In order to test
whether our algorithm can detect the plagiarism, we randomly download a
set of another 35 academic papers from the internet (named
Paper-2, Paper-
3, ... , Paper-36
), which are all related to the same subject, that is the
independence numbers of graphs. Figure 7 is the first pair of papers: Paper-
1A and Paper-1B.
All of the papers are obtained as
pdf
files. Due to the limitation of the
technology, when we convert those pdf files into text files, mathematical for-
mulas are not able to be converted in a proper way: the same formula from
different pdf files may converted into very different sequences consisting of
special symbols separating with various number of spaces. It will definitely
introduce errors when calculating the distance between keywords. In order to
eliminate the errors introduced when converting the pdf files into text files,
we will use the number of alphabets between the keywords (instead of the
number of words between keywords) as the distance between keywords.
The keywords set consists of 23 frequently used terminologies in graph the-
ory. Table 4 and Table 5 indicate the significant difference in the applications
of both methods:
KF
and
KFP
.
From Table 4, estimated by
KFP
method, the similarity between the
Paper-1A (the plagiarism paper) and Paper-1B (the original paper) is 0
.
78,
and the similarities between all other pairs of papers are less than 0.6, most