Graphics Reference
In-Depth Information
them and drawn a circle around the image of their face on a computer
monitor mounted above the display.
As soon as they responded, the robot synthesized another pre-
programmed utterance: “What's your name?” followed immediately
by “My name's Herme”, “H-E-R-M-E”. To which a large number of
respondents gave their own name in reply. We found that sets of three
brief utterances worked best to capture a visitor and lure them into a
conversation. In all, the robot persuaded almost 500 visitors to sign a
consent form allowing us to make further use of their data (the entire
interaction was recorded) while perhaps about twice that number
walked away without signing (but who were recorded anyway as part
of the corpus of human-machine interaction that we are now working
with). Such triples as “Why are you here today”, “Really?”, “Oh”, with
suitable gaps, timed according to interlocutor reaction, and “Tell me
a joke”, “Tell me a funny joke”, “Ha ha, he he he”, were particularly
effective in evoking a response.
The robot spoke in a childlike voice and paid no attention to any
responses from the visitor except to trigger the timing of the next
utterance in its series. This response time is critical. With a remote
human triggering the responses by monitoring the conversation from
the lab through Skype, we achieved the majority of sign-offs, but with
an automatic system using camera and audio input alone we achieved
the most failures. This is to be expected; the intelligence required for
even this simple form of dialogue speech processing is enormous, but
it turned out that even with such a simple device, we were able to
maintain conversations lasting more than three minutes and resulting
in a sophisticated response action (the signing of a consent form and
reading its identification code to the robot).
The use of judicious niblets was, we suppose, the reason for this
success. We triggered a social response in the interlocutor that was
perhaps automatic. By engaging them in social chat, we made use
of primal patterns of behavior that might have facilitated (in a more
sophisticated engine) the transfer of higher-level propositional content.
By mimicking the observed patterns of behavior in social chat, we were
able to engage with adults coming on off the street for a significant
amount of time. Over a three-month period, we obtained an average
of about 10 signed consent forms per day.
7. Conclusion and Future Work
This chapter has presented ideas for the improvement of speech
processing technology based on observations of human speech
Search WWH ::




Custom Search