Graphics Reference
In-Depth Information
speech recognition, animation, planning, etc., some of which may be
off-the-shelf while others are custom-built. As such systems have to
be deconstructed and reconstructed often, CDM proposes blackboards
as the backbone for such integration. This makes it relatively easy to
change information flow, add or remove computational functionality,
etc., even at runtime, as we have regularly done.
As far as dialogue management and turn-taking are concerned,
modular or distributed approaches are scarce. Among the few is the
YTTM (Thórisson, 2002b), a model of multimodal real-time turn-
taking. YTTM proposes that processing related to turn-taking can be
separated, in a particular manner, from the processing of content (i.e.
topic). Echoing the CDM, its architectural approach is distributed
and modular and supports full-duplex multi-layered input analysis
and output generation with natural response times (real-time). One
of the background assumptions behind the approach, which has been
reinforced over time by systems built using the approach (Thórisson
et al., 2008; Jonsdottir, 2008; Ng-Thow-Hing et al., 2007), is that real-
time performance calls for the incremental processing of interpretation
and output generation.
The J.Jr. system (Thórisson, 1993) was a real-time communicative
agent that could take turns in real-time casual conversation with
a human. It was controlled by a finite state-machine architecture,
similar to the Subsumption Architecture (Brooks, 1986). The system
did not process the content of a user's speech, but instead relied on an
analysis of prosodic information to make decisions about when to ask
questions (i.e. take turn) and when to interject back-channel feedback.
While modular, this architecture turned out to be difficult to expand
into a larger, more intelligent architecture (Thórisson, 1996), especially
when confronted with features at different time scales and levels of
abstraction and detail (prosodic, semantic, pragmatic). Subsequent
work on Gandalf (Thórisson, 1996) incorporated mechanisms from J.Jr.
into the Ymir architecture, but presented a much more expandable,
modular system of perception modules, deciders, and action modules
in a holistic architecture that addressed content (interpretation and
generation of meaning) as well as envelope phenomena (process
control). A descendant of this architecture and methodology was
recently used in building an advanced dialogue and planning system
for the Honda ASIMO robot (Ng-Thow-Hing et al., 2007).
Raux and Eskenazi (2008) presented data from a corpus analysis
of an online bus scheduling/information system, showing that a
number of dialogue features, including speech act type, can be used
to improve the identification of speech endpoint, given a silence. The
Search WWH ::




Custom Search