Speech Technology and Conversational Activity in Human-Machine Interaction - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

his phrasing, and his timing, to keep the audience in maximum contact;

allowing them to parse and perceive the impact of each small chunk

before moving on to the next. He is talking not 'to', but 'with' his

audience, repeating chunks where necessary, and constantly testing

their comprehension to maintain their attention.

Our task is to provide machines with a similar faculty—to process

speech interactively, taking into account the cognition of the listener.

2. Even Dogs and Young Babies Can Do It!

The human is a socially organized animal, and we are unique among

animals in spending so much time rearing our young. Our infants are

helpless and dependent on a carer for longer than any other animal,

but in turn they spend long hours watching and (more often) listening

to the people around them, and consequently they learn the norms

of human behavioral patterns very early. They become familiar with

the patterns of speech sounds and rhythms of spoken interaction from

even before birth, as the sounds of the mother's speech are carried

into the womb to the hearing infant along with her blood (with its

varying adrenaline levels) as she goes about her daily conversational

activities (Karmiloff and Karmiloff-Smith, 2001).

The Hungarian ethologist Ádám Miklósi has shown that the feature

which most differentiates dogs from wolves, their nearest animal

neighbor, is that only the former have really learnt to watch and learn

from human behavior. In this way, dogs become companions to the

humans who host and care for them, while wolves lack the capacity

for such serendipitous coexistence (Miklósi, 2008). The capacity to

observe, to interpret behavior and also to empathize is perhaps what

underlies the mechanisms of companionship that are fundamental to

all social relationships.

A key feature of human communication is that we have learnt

to express propositional content alongside social information

simultaneously. From earliest times, we have watched our fellow

beings and learnt to read information about their cognitive states

from their behavior (Dunbar, 1998). We know whether or not they

are listening, and paying attention, and make continuous estimates

about their levels of comprehension as we speak. We unconsciously

structure our speech to facilitate this process.

Our speech processing technologies, however, presently lack any

such notion of empathy. They also lack the ability to observe the effects

of their actions on others. People consequently feel uneasy with much

of present speech technology, and this may be hampering its acceptance

Search WWH ::

Custom Search

Home