Information Technology Reference
In-Depth Information
the parts of the text are inter-dependent and, therefore, are likely to be structurally
inter-related. In addition, as we know that cohesion exists in texts at the clausal,
sentential, and paragraph level [22], it would be no surprise to find that cohesion
also existed across the parts of the text that constitute the whole of the text . If this
were not the case, parts of text would have to exist that bore no reference to the
text as a whole. Therefore, if we measure the cohesion that exists across identifiable
parts of the text, we can predict the degree to which the parts co-refer would be
indicative of the kind of text being analyzed. In Labov's [31] narrative model, for
example, we might expect a high degree of coreference between the second section
(the orientation) and the sixth section (the coda): Although the two sections are
textually distant, they are semantically related in terms of the textual elements with
both sections likely to feature the characters, the motive of the story, and the scene
in which the story takes place. In contrast, we might expect less coreference between
the forth and fifth sections (evaluation and resolution): While the evaluation and
resolution are textually juxtaposed, the evaluation section is likely to offer a more
global, moral and/or abstracted perspective of the story. The resolution , however, is
almost bound to be local to the story and feature the characters, the scene, and the
outcome. Consequently, semantic relations between these two elements are likely to
be less marked.
By forming a picture of the degree to which textual parts inter-relate, we can
build a representation of the structure of the texts, a prototypical model that we
call the textual signature . Such a signature stands to serve students and researchers
alike. For students, their work can be analyzed to see the extent to which their paper
reflects a prototypical model. Specifically, a parts analysis may help students to see
that sections of their papers are under- or over-represented in terms of the global
cohesion. For researchers, a text-type signature should help significantly in mining
for appropriate texts. For example, the first ten web sites from a Google search for a
text about cohesion (featuring the combined keywords of comprehension , cohesion ,
coherence , and referential ) yielded papers from the field of composition theory, En-
glish as a foreign language, and cognitive science, not to mention a disparate array
of far less academic sources. While the specified keywords that were entered may
have occurred in each of the retrieved items, the organization of the parts of the
retrieved papers (and their inter-relatedness) would differ. Knowing the signatures
that distinguishes the text types would help researchers to locate more effectively
the kind of resources that they require. A further possible benefit of textual signa-
tures involves Question Answering (QA) systems [45, 52]. Given a question and a
large collection of texts (often in gigabytes), the task in QA is to draw a list of short
answers (the length of a sentence) to the question from the collection. The typical
architecture of a modern QA system includes three subsystems: question process-
ing, paragraph retrieval and answer processing. Textual signatures may be able to
reduce the search space in the paragraph retrieval stage by identifying more likely
candidates.
7.5 Latent Semantic Analysis
To assess the inter-relatedness of text sections we used latent semantic analysis
(hereafter, LSA ). An extensive review of the procedures and computations involved
in LSA is available in Landauer and Dumais [32] and Landauer et al. [33]. For this
Search WWH ::




Custom Search