A Distributed Architecture for Real-time Dialogue and On-task Learning of Efficient Co-operative Turn-taking - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

score hypothesis being available to the rest of the system during

interlocutors' speech, but final utterance is not calculated until at least

one second of silence has been detected.

3.2 Deciders

Our detailed turn-taking model consists of eight dialogue states (see

Figure 4). This represents the states taken when the turn switches

hands. The dialogue states are modeled with a distributed semi-global

context system, implementing what can (approximately) be described

as a distributed finite state machine that selectively applies to the

activation and de-activation of most modules in the system. Context

transition control (“state transitions”) in this system is managed by a

set of deciders (Thórisson, 2008). There is no theoretical limit to how

many deciders can be active for a single given system-wide context.

Likewise, there is no limit to how many deciders can manage identical

or non-identical transitions. Reactive deciders (IGTD, OWTD, ...) are

the simplest, with one decider per transition. Each contains at least

one rule about when to transition, based on both temporal and other

information. Transitions are made in a pull manner: the Other-Accepts-

Turn-Decider, e.g. transits to context Other-Accepts-Turn (see Figure 4).

The Dialogue Planner (DP) and Learning modules (see further

description below) can influence the dialogue state directly by sending

context transition messages I-Want-Turn, I-Accept-Turn, and I-Give-

Turn; however, all these decisions are under the supervisory control

of the DP: If the Content Generator (CG) has some content ready to be

communicated, the agent might want to signal that it wants a turn and

it may want to signal I-Give-Turn when the content queue is empty

(i.e. have nothing to say). Decisions made by these modules override

decisions made by other turn- taking modules. The DP also manages

the content delivery; that is, when to start speaking, withdraw,

or raise one's voice. The CG is responsible for creating utterances

incrementally, in “thought chunks”, typically of durations shorter than

1 second. We are developing a dynamic content generation system

at present; based on these principles the CG currently simulates its

activity by selecting thought units to speak from a pre-defined list.

It signals when content is available to be communicated and when

content has been delivered.

In the present system, the module Other-Gives-Turn-Decider-2

(OGTD-2) uses the data produced by the Learner module to change the

behavior of the system. At the point when the speaker stops speaking,

the challenge for the listening agent is to decide how long to wait

before starting to speak (OGTD-1 has a static behavior of transitioning

Search WWH ::

Custom Search

Home