Graphics Reference
In-Depth Information
performance. It has proposed that the supposed ill-formedness typical
of spontaneous interactive speech is not due to 'performance error'
as has previously been claimed, but that it is part of a highly evolved
system of human social interaction that provides a framework for
the exchange of propositional content alongside the signaling of
interpersonal details.
The chapter has drawn attention to the function of mutual co-
creation of meaning (as expounded by Kendon and others) and shown
that a judicious combination of visual and audio information can be
used to provide information about the dynamics of a discourse such
that deep processing can be constrained to only a relatively small
subset of the actual speech that is produced.
The proposal that interactive speech is as much a social process
as it is discoursal was tested by the use of a small robot synthesizer
that processed response activity (rather than speech content)
and was able to maintain a sustained 'conversation' with unpaid
volunteer off-the-street participants using a pre-programmed dialogue
script.
Being educated people, exposed to the patterns of written text since
early childhood, perhaps many of us are blind to the actual nature
of the conversational speech that we hear every day, and mentally
transform the fragmented input into a well-formed image of what
the speaker is 'saying'. Like machines, we do our best to map from
the fragmented actual utterance sequence onto a well-formed text
equivalent for easier parsing.
The current paradigms of computer-based spoken dialogue
systems still remain firmly grounded in text, and require extensive
transformations between speech and text for processing. This chapter
has proposed that a different system, employing image processing
alongside audio information, might be developed, and that the
granularity and well-formedness of the speech chunks that it processes
might then be closer to that observed in actual human spoken
interaction. The term 'niblets' was proposed for the minimal units of
meaning that form the basic units for processing, and the importance
of prosody in the rendering of each niblet was stressed.
Prosody serves to convey structural information that helps
disambiguate propositional content, but it also, and at the same time,
provides information about the discoursal and social intentions of the
speaker.
There might be a fundamental 'law of nature' that requires
interacting humans to first negotiate on the level of companionship
and then to broaden the discussion to matters of more general interest.
Search WWH ::




Custom Search