Information Technology Reference
In-Depth Information
used in DocuScope. That is, a reader deciding whether a given text is a member
of a desired genre reads a text from beginning to end attending to a variety of
linguistic markers that she intuitively associates with the target genre like those
considered by Doscuscope. A reader, for example, deciding whether a given let-
ter is a letter of apology or a letter of complaint will intuitively attend to the
high number of first person pronouns and autobiographical references with the
apology genre and, after coming across a number of such markers, will classify
a given text as an apology. The commonality of the multivariate components
in the analysis of both the rhetorical reader and DocuScope arises, of course,
because DocuScope has been built by mining rhetorical understanding to begin
with.
While rhetorical readers share with DocuScope a common multivariate frame-
work, they are not as good at attending to multivariate data. Limits on memory
and attention lead the rhetorical reader to focus on one or two salient vari-
ables positively associated with a genre and disregard almost entirely evidence
of negatively associated attributes. A rhetorical reader looking for letters of apol-
ogy, for example, will easily focus on evidence of autobiographical reference but
ignore other variables positively associated with apologies such as acknowledge-
ments and reassurances. A rhetorical reader will, furthermore, overlook entirely
attributes negatively associated with apologies: the presence of imperatives and
requests, for example, that are more associated with collection letters and neg-
atively indicate the apology.
By narrowing the positive attributes and overlooking negative attributes, the
rhetorical reader turns a complex classification task into a simpler one. Never-
theless, genre identification remains a time-consuming task fraught with error.
It is here that DocuScope can help by building a multivariate model of genre
that explicitly includes both positively and negatively associated attributes.
6.2
Testing the Multivariate Model
For a rhetorician, however, the tool, is only as good as the underlying model.
To provide a test of validity, we undertook a blind comparison of DocuS-cope's
judgements with our own rhetorical intuitions in a task of genre identifica-tion.
The logic of our test was as follows: If we give DocuScope two relatively small sets
of texts, one of which contains texts that are members of a given genre and the
other of which contains those that are definitely not members, can DocuScope
discriminate well enough between them to successfully classify a much larger
group of texts that it has never seen before?
For the test, we used an increasingly common problem for scholars of rhetoric:
the identification of texts representing a specific genre from a large-scale textual
data-base. Traditionally, rhetorical readers select texts of a given genre through
a combination of selective search and rhetorical intuition. For example, when
Charles Bazerman [14] wanted to select the patent applications filed by Thomas
Edison on electric light, he selectively searched the files of the U.S. Patent oce,
under the names of Edison and his colleagues, and then presumed, unproblem-
atically in this case, that the results of this search were indeed patents. When he
Search WWH ::




Custom Search