Information Technology Reference
In-Depth Information
7
Textual Signatures: Identifying Text-Types
Using Latent Semantic Analysis to Measure
the Cohesion of Text Structures
Philip M. McCarthy, Stephen W. Briner, Vasile Rus, and
Danielle S. McNamara
7.1 Introduction
Just as a sentence is far more than a mere concatenation of words, a text is far
more than a mere concatenation of sentences. Texts contain pertinent information
that co-refers across sentences and paragraphs [30]; texts contain relations between
phrases, clauses, and sentences that are often causally linked [21, 51, 56]; and texts
that depend on relating a series of chronological events contain temporal features
that help the reader to build a coherent representation of the text [19, 55]. We re-
fer to textual features such as these as cohesive elements, and they occur within
paragraphs (locally), across paragraphs (globally), and in forms such as referential,
causal, temporal, and structural [18, 22, 36]. But cohesive elements, and by conse-
quence cohesion, does not simply feature in a text as dialogues tend to feature in
narratives, or as cartoons tend to feature in newspapers. That is, cohesion is not
present or absent in a binary or optional sense. Instead, cohesion in text exists on
a continuum of presence, which is sometimes indicative of the text-type in ques-
tion [12, 37, 41] and sometimes indicative of the audience for which the text was
written [44, 47]. In this chapter, we discuss the nature and importance of cohesion;
we demonstrate a computational tool that measures cohesion; and, most impor-
tantly, we demonstrate a novel approach to identifying text-types by incorporating
contrasting rates of cohesion.
7.2 Cohesion
Recent research in text processing has emphasized the importance of the cohesion of
a text in comprehension [5, 44, 43]. Cohesion is the degree to which ideas in the text
are explicitly related to each other and facilitate a unified situation model for the
reader. As McNamara and colleagues have shown, challenging text (such as science)
is particularly di cult for low-knowledge students. These students are cognitively
burdened when they are forced to make inferences across texts [22, 34, 35, 38, 44].
Adding cohesion to text alleviates this burden by filling conceptual and structural
gaps. Recent developments in computational linguistics and discourse processing
have now made it possible to measure this textual cohesion. These developments
Search WWH ::




Custom Search