Graphics Reference
In-Depth Information
appears not to be so well synchronized with the others except during
the first part of the conversation. Participant 1 is positive throughout,
and docks well with the others. (Participant 2 was the author who, as
one of the experiment participants, tried to maintain a less dominant
role as the dialogue progressed; Participant 1 was a guest.)
6. Multimodal Discourse in the Real World
We carried out an experiment to implement and test the above
observations in the form of an advanced dialogue system.
We installed audio and video sensors on a small robot platform.
This prototype device used a LEGO NXT (Mindstorms) robot as
a mobile platform for a small high-definition webcam and noise-
canceling microphone array, with wired and wireless streaming of
data to a nearby computer (an Apple Mac-Mini) for processing. The
webcam provides the 'eyes' and the microphone serves as 'ears', with
a second sound sensor to provide some feedback on background noise
levels, in conjunction with an ultrasonic distance sensor to assist
in maneuvering the device and to help locate and face the current
speaker. The sound sensor provides secondary low-level 'hearing'
for the detection of speech activity, and the ultrasonic distance gauge
allows for precise positioning of the robot so that audio and webcam
capture conditions can be optimized.
A Viola-Jones algorithm was used in a modified implementation
of OpenCV for face detection from camera input, and movement was
measured in the area of the detected faces as reported in Campbell
and Douxchamps (2007). Use was made of fundamental frequency and
rms amplitude in a crude speaker-detection and overlapping-speech-
detection module. The video information from the face detection
proved useful in matching voice activity to identified persons within
the areas of vision. The combination of these two types of information
appears highly effective, but we have not yet performed formal or
quantitative tests to support this claim.
The interaction experiment took the form of a short conversation
with visitors who came in freely off the street to view exhibits in the
Science Gallery (of which this was one). The 'exhibit' was labelled as
illustrating the difficulties of robot speech recognition and the viewer
was challenged to test the robot by speaking to it. A speech synthesizer
was activated when a person came into view of the robot and initiated
a series of predetermined utterances, starting with “hello, hi!”. This
was repeated after a gap, and most people responded to the second
greeting. It helped that they could see the robot's camera had spotted
Search WWH ::




Custom Search