Graphics Reference
In-Depth Information
The implication of such a processing sequence is that each
utterance will be well-formed, relatively complete in itself, and that
each 'turn' will provide sufficient lexical content for an intact parse
to be obtained. We are not concerned here with details of components
that process the propositional content of each utterance to provide a
response, but instead with the nature of the interaction per se and
examine the so-called 'ill-formedness' of individual utterances in
spontaneous conversational speech. Current technology does not
perform well when processing normal everyday speech because it
often fails to cope with its highly fragmented nature. We will look
next at some implications of this for interactive speech processing.
Human spoken interaction in its everyday form is not essentially
an asynchronous process, and features considerable fragmentation or
overlapping of syntactically ill-formed utterances, along with regular
backchannel noises, interruptions and mutual completions of the
other partner's utterance as interlocutors interact (Goffman, 1961;
Shegloff, 2007). Conversation is structured and balanced much like a
dance, with 'listening' being an active and contributive process, i.e.
not just passively receiving information but actively collaborating in
the production of meaning. Participation in a conversation requires
constant mutual monitoring of attention as the themes are conjointly
expounded and mutually developed (Chapple, 1939; Kendon, 1990).
Speech has evolved to facilitate this mutual structuring which by
definition is lacking in written text, which in turn evolved considerably
later to perform a significantly different function.
The following example illustrates the difference in style between
written text and a spoken form of interaction. The content appears
to be highly fragmented, and sentence boundaries are unclear (if
the concept of 'sentence' has any meaning at all in spoken sequences).
It rambles but in a natural way, easy for us to follow if we think of it
as sounds that we are listening to (we can even 'hear' the tone-of-voice
change perhaps), but this content would be very difficult for a machine
to parse. This sample is from a monologue, from the transcription of a
prepared public talk by a top-ranking newspaper journalist, Seymour
Hersh, who writes for the New York Times and New Yorker. It is
part of his keynote speech to the ACLU in 2004 (from http://www.
informationclearinghouse.info/article6492.htm):
“You know, | we all know the story | of how mad they got at General
Shinseki, | who I think is going to run for the Senate in Hawaii | and
should, | for Inouye's seat, | he's a great general. |The important thing
about Shinseki | for me, | and this is just heuristic, | I don't know this, |
the important thing about Shinseki is this. | He testifies before the Gulf War
| we're going to need a couple hundred thousand troops | and everybody,
Search WWH ::




Custom Search