Graph Model for Pattern Recognition in Text - Mining and Analyzing Social Networks

Information Technology Reference

In-Depth Information

Fig. 6 Emails: 2002-02-20a.html and 2002-07-04b.html

4.2 Plagiarism Papers

Plagiarism in academic articles is a well-known issue. The widespread use

of computers and the Internet has made it easier to plagiarize the work of

others. Most cases of plagiarism are found in academia, where documents are

typically scientific papers, essays or reports [27]. Our experiments show that

the KFP method can be used to detect the plagiarism very eciently.

In this case study our methodology involved the acquisition of a well-known

plagiarised paper [28] (named Paper-1A ) on the independence number of a

graph and its corresponding original paper (named Paper-1B ). In order to test

whether our algorithm can detect the plagiarism, we randomly download a

set of another 35 academic papers from the internet (named Paper-2, Paper-

3, ... , Paper-36 ), which are all related to the same subject, that is the

independence numbers of graphs. Figure 7 is the first pair of papers: Paper-

1A and Paper-1B.

All of the papers are obtained as pdf files. Due to the limitation of the

technology, when we convert those pdf files into text files, mathematical for-

mulas are not able to be converted in a proper way: the same formula from

different pdf files may converted into very different sequences consisting of

special symbols separating with various number of spaces. It will definitely

introduce errors when calculating the distance between keywords. In order to

eliminate the errors introduced when converting the pdf files into text files,

we will use the number of alphabets between the keywords (instead of the

number of words between keywords) as the distance between keywords.

The keywords set consists of 23 frequently used terminologies in graph the-

ory. Table 4 and Table 5 indicate the significant difference in the applications

of both methods: KF and KFP .

From Table 4, estimated by KFP method, the similarity between the

Paper-1A (the plagiarism paper) and Paper-1B (the original paper) is 0 . 78,

and the similarities between all other pairs of papers are less than 0.6, most

Search WWH ::

Custom Search

Home