Textual Genre Analysis and Identification - Ambient Intelligence for Scientific Discovery

Information Technology Reference

In-Depth Information

ambiguous or incorrect. In cases where they found errors, students proposed

changes to the software's internal dictionaries as part of their log-work. If their

proposals were verified by the course instructors, the internal dictionaries were

changed to reflect them. It is beyond the scope of this paper to say more about

these methods. Further discussion about them is available elsewhere [4].

5 Coding Heuristics:

Accommodating Surface Language Indeterminacy

Thus far, we have made a case for manually coding surface language to uncover

genre functions. There are reasons, some we have hinted at already, for a healthy

skepticism toward this project. The surface stream of natural language is fraught

with ambiguity, irregularity, and contextual contingency. It is futile to hope that

communities of speakers will segment surface strings in the perfectly predictable

and convergent way that machines can algorithmically. So in what do we place

some confidence that our string segmentations of English provide reliable grist

for macro-level analysis of texts?

We place this confidence in the fact that we have coded strings from an

idealized perspective of the competent writer. Whatever the hazards of surface

language, competent writers fearlessly take on the hazard, and typically end up

with reasonable success in pinning down the experiences for readers they wish

to pin down. How do competent writers do it? We don't pretend to have direct

psychological answers to this question. We did, however, benefit, we believe, by

using this question, and possible answers to it, as a basis for establishing our

coding rules. More to the point, we idealized the role of a competent writer

as an underlying basis for our coding. Unlike the competent language-user in

the Chomskian sense, who brings to sentence generation an implicit standard

of grammaticality, the competent writer, in the sense we coded for, brings to

the generation process an expertise in pinning down experience for other human

beings in contiguous runs of language.

We further imagined an idealized competent writer as always involved in

an ongoing mental game we called rhetorical scrabble. In conventional scrabble,

players can be dealt any subset of 26 letters and must form legal words with them.

In rhetorical scrabble, we imagined, the writer is dealt in background any legal

word of English (anywhere from 15,000 to 100,000 depending upon the writer's

vocabulary) and is challenged to select a word from the list or create a word series

that completes a felt-experience that another human (a reader) can recover.

The shorter the series, the more the writer needs to worry about ambiguity,

communicating too many possible experiences without communicating one. The

longertheseries,themorethewriterneeds to worry about recurrence and re-

usability, the chances that the series in question, called upon once, will continue

tobecalleduponinothertextsandcontexts. The ideal player of rhetorical

scrabble, it is important to keep in mind, boasts not just the largest conceivable

store of English strings. He or she boasts the largest conceivable store of diverse

and re-usable strings.

Ambient Intelligence for Scientific Discovery

Search WWH ::

Custom Search

Home