A Distributed Architecture for Real-time Dialogue and On-task Learning of Efficient Co-operative Turn-taking - Coverbal Synchrony in Human-Machine Interaction

Graphics Reference

In-Depth Information

5.3 Parameter settings

The main goal of the learning task is to differentiate silences in

real-time based on partial information of an interlocutor's behavior

(prosody only) and predict the best reciprocal behavior. For best

performance, the system needs to find the right tradeoff between

shorter silences and the risk of overlapping speech. To formulate this

as a Reinforcement Learning problem, we need to define states and

actions for our scenario.

Using single-step Q-Learning, the feature combination in the

prosody preceding the current silence becomes the state and the length

of the STW becomes the action to be learned. For efficiency, we have

split the continuous action space into discrete logarithmic values

(see Table 1), starting with 10 msecs and doubling the value up to

1.28 seconds (the maximum STW where the system takes the turn by

default). The action selection policy for OGTD-2 is G-greedy with 10%

exploration, always selecting the shorter STW if two or more actions

share the top spot.

The reward given for decisions that do not lead to overlapping

speech (i.e. successful transitions) is the milliseconds in the selected

STW; a 100 msec STW will receive a reward of -100 if successful and

STW of 10 msecs will receive -10 points. If, however, overlapping

speech results from the decision (i.e. the action is unsuccessful), a fixed

reward of -2000 (i.e. more than waiting the maximum amount of time)

is given. This is to simulate that when two STWs are without overlap,

the smaller is better. Every reward in the learning system is negative,

resulting in unexplored actions being the best option at each time,

since return starts at 0.0 for unexplored actions, and once a reward has

been given the return can only decrease. In the beginning, the agent

Table 1.

Discrete actions representing STW size in msecs.

Action (STW)

Reward: Successful transition

Reward: Unsuccessful transition

10

-10

-2000

20

-20

-2000

40

-40

-2000

80

-80

-2000

160

-160

-2000

320

-320

-2000

640

-640

-2000

1280

-1280

-2000

Coverbal Synchrony in Human-Machine Interaction

Search WWH ::

Custom Search

Home