Information Technology Reference
In-Depth Information
to ST ( p< .001); SP to DT ( p< .001); and ST to DT ( p<.05 ). Most importantly, the
interaction between corpus and section was significant, F (6,171) = 4.196, MSE =.007,
p<.001 , indicating that the differences between the sections depends on the type of
corpora.
The results from this experiment indicate that the corpus using the same pa-
pers for the comparisons (SP) shows greater internal difference than do those with
either similar or different themes (i.e., ST, DT). This result is largely due to the
stronger AR comparison generated in the SP corpus. While the signatures generated
from the ST and DT corpora are internally similar, the results of this experiment
offer evidence that the degree of similarity between sections within the corpora is
significantly different.
These results allowed us to extend our signature assumption to predicting that
LSA can differentiate three text-types: the same paper, similarly themed papers,
and differently themed papers.
7.12 Discussion
The results of this study suggest that LSA comparisons of textual sections can
produce an identifiable textual signature. These signatures serve as a prototypical
model of the text-type and are distinguishable from those produced by texts which
are merely similar in theme (ST), or similar in field (DT).
Textual signatures of the type produced in this study have the potential to
be used for a number of purposes. For example, students could assess how closely
the signature of their papers reflected a prototypical signature. The discrepancies
between the two LSA cosines may indicate to the student where information is
lacking, redundant, or irrelevant. For researchers looking for supplemental material,
the signatures method could be useful for identifying texts from the same field,
texts of the same theme, and even the part of the text in which the researcher is
interested. Related to this issue is a key element in Question Answering systems:
as textual signatures stand to identify thematically related material, the retrieval
stage of QA systems may be better able to rank its candidate answers.
Future research will focus on developing a range of textual signatures beyond the
abstract comparisons outlined in this chapter. Specifically, comparisons of section
parts from the perspective of the introduction , methods , results , and discussions
sections need to be examined. This broader scope offers the possibility of greater
accuracy in textual identification. For example, papers that were only thematically
related would likely have higher overlaps generated from introduction sections than
from other sections. Introductions feature a review of the literature which would
likely be highly consistent across papers within the same theme, whereas the other
sections (especially the results section) would likely be significantly different from
paper to paper.
In addition to extending the perspectives of signatures, we also need to consider
how other indices may help us to better identify textual signatures. Coh-Metrix
generates a variety of alternative lexical similarity indices such as stem , lemma ,
and word overlap. While these indices do not compare semantic similarities such as
table/chair or intelligence/creativity (as LSA does), they do compare lexical overlaps
such as produce/production , suggest/suggests and investigate/investigated . Indices
Search WWH ::




Custom Search