Information Technology Reference
In-Depth Information
Textual Genre Analysis and Identification
David Kaufer 1 , Cheryl Geisler 2 , Suguru Ishizaki 1 , and Pantelis Vlachos 1
1 Carnegie Mellon University, USA
{ kaufer,ishizaki,vlachos } @andrew.cmu.edu
2 Rensselaer Polytechnic Institute, USA
geislc@rpi.edu
1
Introduction
This chapter reports on a research program that investigates language and text
from a rhetorical point of view. By rhetorical, we mean an approach that features
the relationship between the speaker and the audience or between the writer and
the reader. Fundamental to a rhetorical approach to language is an interest in
linguistic and textual agency, how speakers and writers manage to use language
strategically to affect audiences; and how audiences and readers, agents in their
own right, manage, or not, to pick up on, register, and respond to a speaker or
writer's bids. Historical and cultural factors play a central role in how speakers
and writer settle into agent roles vis-a-vis listeners and readers. It is therefore
no surprise that rhetorical approaches to language treat language, culture, and
history as deeply permeable with one another. Rhetorical approaches to language
have, since ancient Greece, been the dominant approach for educating language-
users in the western educational curriculum [1].
At the heart of our research program has been the development of a text vi-
sualization and analysis environment specifically designed to carry out rhetorical
research with language and text. The environment is called DocuScope [2]. The
DocuScope environment permits human knowledge workers, through computer-
aided visual inspection and coding, to harvest and classify strings of English,
primarily 1-5 contiguous word sequences. These are strings that, without con-
scious effort, speakers and writers use and reuse as part of their vast repertoire of
implicit knowledge relating language and the audience experience. We have cho-
sen a knowledge-based, expert-system-like, approach for our language measures
because we were especially interested in the analysis and discovery of textual
genres. Genres lie at the interaction of language and culture to perform situated
work [3]. To capture them requires a sociological and anthropological breakdown
of texts even more than a formal linguistic analysis.
In this chapter, we do not focus on the technical details or interface of the
DocuScope environment. Nor do we focus on the details of the string libraries,
which have been discussed elsewhere [4]. We rather focus on some of the theo-
retical motivation, empirical assumptions, coding heuristics, and research results
the DocuScope environment has yielded and that it is capable of yielding. More
specifically, the first sections discuss some of the motivation for building the
environment. The middle sections describe the methods we used to build the
Search WWH ::




Custom Search