A Distributed Architecture for Real-time Dialogue and On-task Learning of Efficient Co-operative Turn-taking - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

Our current version of the system learns to become better at taking

cooperative turns in real-time dialogue while it is up and running,

improving its own ability to take turns correctly and quickly , with

minimal speech overlap. The results are in line with prior versions

of the system, where the system interacted with itself over hundreds

of trials (Jonsdottir et al., 2008). Evaluation including human subjects

so far includes a within-subjects study of 5 minutes of continuous

interaction with each user (a total of 50 minutes), in three different

conditions: (1) A closed, noise-free, setup with a very consistent

interlocutor—another instance of itself (“Artificial” condition). (2) An

open-mic setup, using Skype, where the system repeatedly interviews

a fairly consistent interlocutor—the same human (“Single person”

condition). (3) An open-mic setup, using Skype, with individual

inconsistencies where the agent interviews 10 different human

participants consecutively (“10 people” condition). The system adapts

quickly and effectively (linearly) within 2 minutes of interaction, a

result which, in light of most other machine-learning work on the

subject—many of which require thousands of hand-picked training

examples—is exceptionally efficient.

The rest of this chapter is organized as follows: First, we review

related work, then we detail the architecture and learning mechanisms.

A description of the evaluation setup comes next, followed by the

results, summary, and future work.

2. Related Work

Models of dialogue produced by a standard divide-and-conquer

approach can only address a subset of a system's behaviors, and are

even quite possibly doomed at the outset. This view has been presented

in our prior work (Thórisson, 2008) and is echoed in other work on

dialogue architectures (cf. Moore, 2007). Requiring a holistic approach

to a complex system such as human real-time dialogue may seem to

be impossibly difficult. In our experience, and perhaps somewhat

counterintuitively, when taking a breath-first approach to the creation

of an architecture that models any complex system—where most

of the significant high-level features of the system to be addressed

are taken into account—the set of likely contributing underlying

mechanisms will be greatly reduced (Schwabacher and Gelsey, 1996),

quite possibly to a small, manageable set, thus greatly simplifying

the task. It is the use of levels of abstraction that is especially

important for cognitive phenomena: Use of hierarchical approaches

is common in other scientific fields such as physics; for example,

behind models of optics lie more detailed models of electromagnetic

Search WWH ::

Custom Search

Home