Graphics Reference
In-Depth Information
and anything in the architecture's organization. Schlangen (2006) has
successfully used machine learning to categorize prosodic features
from corpus, showing that acoustic features can be learnt. Traum and
Heeman (1996) have addressed the problem of utterance segmenting,
showing that prosodic features such as boundary tones do play a role
in turn-taking. As far as we know, none of this work has been applied
to real-time situations. Bonaiuto and Thórisson (2008) demonstrate a
system of two simulated interacting dialogue participants that learn
to exploit each other's multimodal behaviors (that is, modality-
independent multi-dimensional behaviors) to achieve a cooperative
interaction where minimizing speech overlaps and speech pauses are
the shared goals (as is the standard situation in amicable interactions
between acquaintances, friends, and family—shared with the present
work). Using a neuro-cognitive model of learning, the work shows that
emergent properties of dialogue, pauses, hesitations, interruptions—
i.e. negotiations of turn—can be learned via the general framework
provided by YTTM, and its fluid states, coupled with Bonaiuto and
Arbib's ACQ model of learning (Bonaiuto and Arbib, 2010). While
Bonaiuto and Thórisson's system was based on the YTTM, the
implementation of the learning mechanisms was neither meant to run
on-line nor in real-time.
In summary, no prior system has implemented a comprehensive
dialogue system capable of on-line learning of turn-taking skills, and
allowed it to adapt to its interlocutors in real-time. The turn-taking
model presented here is an extended version of the YTTM (Thórisson,
2002b) with the simplification that the communicative channel is limited
to the speech modality. Turn-taking is modeled as an agent-oriented
negotiation process with eight turn-taking, semi-global “cognitive
contexts” or fluid states that define the perceptual and behavioral
disposition of the system at any point in the dialogue, as already
mentioned. These contexts support, in effect, a distributed planning
and control system for both perception and action; the distributed
learning scheme we present below implements a negotiation-driven
tuning of real-time turn-taking behaviors within this framework.
3. System Architecture
Our multi-module dialogue system is capable of real-time dialogue
with human users speaking naturally, with no artificial constraints
on the process of interaction. As mentioned above, the architecture
follows the principles of modularity outlined above, as specified by
the CDM methodology (Thórisson et al., 2004; Thórisson, 2008), and
enables us to introduce learning into the architecture in a modular
Search WWH ::




Custom Search