Information Technology Reference
In-Depth Information
chapter, however, we offer only an overview of the theory of LSA, its method of
calculations, and a summary of some of the many studies that have incorporated its
approach.
LSA is a technique that uses a large corpus of texts together with singular value
decomposition to derive a representation of world knowledge [33]. LSA is based
on the idea that any word (or group of words) appears in some contexts but not
in others. Thus, words can be compared by the aggregate of their co-occurrences.
This aggregate serves to determine the degree of similarity between such words [13].
LSA's practical advantage over shallow word overlap measures is that it goes beyond
lexical similarities such as chair/chairs or run/ran , and manages to rate the relative
semantic similarity between terms such as chair/table , table/wood , and wood/forest .
As such, LSA does not only tell us whether two items are the same, it tells us how
similar they are. Further, as Wolfe and Goldman [54] report, there is substantial
evidence to support the notion that the reliability of LSA is not significantly different
from human raters when asked to perform the same judgments.
As a measure of semantic relatedness, LSA has proven to be a useful tool in
a variety of studies. These include computing ratings of the quality of summaries
and essays [17, 29], tracing essay elements to their sources [15], optimizing texts-
to-reader matches based on reader knowledge and projected di culty of unread
texts [53], and for predicting human interpretation of metaphor di culty [28]. For
this study, however, we adapted the LSA cohesion measuring approach used by
Foltz, Kintsch & Landauer [16]. Foltz and colleagues formed a representation of
global cohesion by using LSA to analyze the relationship of ever distant textual
paragraphs. As the distances increased, so the LSA score of similarity decreased.
The results suggested that LSA was a useful and practical tool for measuring the
relative degrees of similarity between textual sections. In our study, however, we
replace Foltz and colleagues comparison of paragraphs with a comparison of journal
sections, and rather than assuming that cohesion would decrease relative to distance,
we made predictions based on the relative similarity between the sections of the
article.
7.6 Predictions
The abstract section was selected as the primary source of comparison as it is the
only section whose function is specifically to relate the key elements of each other
section of the paper. But the abstract does not relate to each other section of the
paper equally. Instead, the abstract outlines the theme of the study (introduction);
it can refer to the basic method used in the study (methods); it will briefly state a
prominent result from the study (results); and it will then discuss the relevance of the
studys findings (discussions). This definition allowed us to make predictions as to the
signature generated from such comparisons. Specifically, we predicted that abstracts
would feature far greater reference to the introduction (AI comparison type) and
discussion sections (AD comparison type), less reference to the results section (AR
comparison type), and less reference still to the methods section (AM comparison
type). The reason for such predictions is that abstracts would take more care to
set the scene of the paper (the introduction) and the significance of the findings
(discussions). The results section, although important, tends to see its key findings
restated in the discussion section, where it is subsumed into the significance of the
Search WWH ::




Custom Search