Textual Genre Analysis and Identification - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

In the framework we are exploring, the close reader is the reader who has

learned how to use a machine “reading” to get closer to the text as a human

reader. Within an approach we have experimented within our own classrooms,

students use the results of the machine as a perch to recover their own implicit

judgments about composing choice and genre. There is irony in the fact that

humanists associate reading with the most intimately human of activities, remote

from machines [9]. And yet to make progress on Richards' dream of a practical

theory of close reading, human readers, we submit, may find that putting a

machine in the loop may, under the right conditions, enhance, not detract from,

their intimacy with texts.

4 Methods for Finding and Classifying Strings

as Coding Units

How did we find and classify strings of reader experience in English texts when

they are indeterminate and hard to detect and harvest in our normal reading

processes? We built a specialized environment for the task. We relied on an expert

system, where we used technology to harvest, rather than replace or mimic,

what a culturally-in-the-know human writer or reader knows about rhetorical

actions. Rather than make the computer smart [10] about English strings, we

would create an environment that would allow us to encode our knowledge about

strings based on a well known string-matching algorithm [11, 12].

After coding a list of categorized strings and observing how they matched

on a set of sample texts, we used our string-matcher on new texts to test our

prior codings for accuracy and completeness. When we discovered our string-

matcher making incorrect (i.e., that is, ambiguous, misclassified, or incomplete)

matches on the new texts, we would use this information to elaborate the strings

our string matcher could recognize. By repeating a cycle of coding strings on

training texts and then testing and elaborating strings based on how well they

explained priming actions on new texts, we were able to grow our catalog of

strings systematically and consistently.

Let's now watch this process in action. Imagine reading a newspaper contain-

ing “smeared the politician” as a verb phrase. This first inspection would prompt

the generalization that the string “smeared the politician” conveys the idea of

negative affect. We would code it thus in our dictionaries. We would input into

our dictionary many syntactic variations that also conveyed this negative idea

(e.g., smeared him, smeared them, smeared them). Now these dictionaries would

be applied to new collections of uncoded texts, allowing us to find the limits of

our own initial coding assumptions. For example, the coding of “smeared him”

with a negative effect would seem incorrect in the longer string environment of

“smeared him with soap.” Although errors of this type were ubiquitous, partic-

ularly in our early generations of coding, the software made it relatively easy

for us to spot the mistake and revise the underlying dictionaries accordingly.

We thought of this rapid revision process as “improving the eyesight” of the

dictionaries by putting in human readers out front to assist it. Over three years,

Ambient Intelligence for Scientific Discovery

Search WWH ::

Custom Search

Home