Information Technology Reference
In-Depth Information
Fig. 6 Emails: 2002-02-20a.html and 2002-07-04b.html
4.2 Plagiarism Papers
Plagiarism in academic articles is a well-known issue. The widespread use
of computers and the Internet has made it easier to plagiarize the work of
others. Most cases of plagiarism are found in academia, where documents are
typically scientific papers, essays or reports [27]. Our experiments show that
the KFP method can be used to detect the plagiarism very eciently.
In this case study our methodology involved the acquisition of a well-known
plagiarised paper [28] (named Paper-1A ) on the independence number of a
graph and its corresponding original paper (named Paper-1B ). In order to test
whether our algorithm can detect the plagiarism, we randomly download a
set of another 35 academic papers from the internet (named Paper-2, Paper-
3, ... , Paper-36 ), which are all related to the same subject, that is the
independence numbers of graphs. Figure 7 is the first pair of papers: Paper-
1A and Paper-1B.
All of the papers are obtained as pdf files. Due to the limitation of the
technology, when we convert those pdf files into text files, mathematical for-
mulas are not able to be converted in a proper way: the same formula from
different pdf files may converted into very different sequences consisting of
special symbols separating with various number of spaces. It will definitely
introduce errors when calculating the distance between keywords. In order to
eliminate the errors introduced when converting the pdf files into text files,
we will use the number of alphabets between the keywords (instead of the
number of words between keywords) as the distance between keywords.
The keywords set consists of 23 frequently used terminologies in graph the-
ory. Table 4 and Table 5 indicate the significant difference in the applications
of both methods: KF and KFP .
From Table 4, estimated by KFP method, the similarity between the
Paper-1A (the plagiarism paper) and Paper-1B (the original paper) is 0 . 78,
and the similarities between all other pairs of papers are less than 0.6, most
 
Search WWH ::




Custom Search