Speech Technology and Conversational Activity in Human-Machine Interaction - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

appears not to be so well synchronized with the others except during

the first part of the conversation. Participant 1 is positive throughout,

and docks well with the others. (Participant 2 was the author who, as

one of the experiment participants, tried to maintain a less dominant

role as the dialogue progressed; Participant 1 was a guest.)

6. Multimodal Discourse in the Real World

We carried out an experiment to implement and test the above

observations in the form of an advanced dialogue system.

We installed audio and video sensors on a small robot platform.

This prototype device used a LEGO NXT (Mindstorms) robot as

a mobile platform for a small high-definition webcam and noise-

canceling microphone array, with wired and wireless streaming of

data to a nearby computer (an Apple Mac-Mini) for processing. The

webcam provides the 'eyes' and the microphone serves as 'ears', with

a second sound sensor to provide some feedback on background noise

levels, in conjunction with an ultrasonic distance sensor to assist

in maneuvering the device and to help locate and face the current

speaker. The sound sensor provides secondary low-level 'hearing'

for the detection of speech activity, and the ultrasonic distance gauge

allows for precise positioning of the robot so that audio and webcam

capture conditions can be optimized.

A Viola-Jones algorithm was used in a modified implementation

of OpenCV for face detection from camera input, and movement was

measured in the area of the detected faces as reported in Campbell

and Douxchamps (2007). Use was made of fundamental frequency and

rms amplitude in a crude speaker-detection and overlapping-speech-

detection module. The video information from the face detection

proved useful in matching voice activity to identified persons within

the areas of vision. The combination of these two types of information

appears highly effective, but we have not yet performed formal or

quantitative tests to support this claim.

The interaction experiment took the form of a short conversation

with visitors who came in freely off the street to view exhibits in the

Science Gallery (of which this was one). The 'exhibit' was labelled as

illustrating the difficulties of robot speech recognition and the viewer

was challenged to test the robot by speaking to it. A speech synthesizer

was activated when a person came into view of the robot and initiated

a series of predetermined utterances, starting with “hello, hi!”. This

was repeated after a gap, and most people responded to the second

greeting. It helped that they could see the robot's camera had spotted

Search WWH ::

Custom Search

Home