Graphics Reference
In-Depth Information
5.3 Parameter settings
The main goal of the learning task is to differentiate silences in
real-time based on partial information of an interlocutor's behavior
(prosody only) and predict the best reciprocal behavior. For best
performance, the system needs to find the right tradeoff between
shorter silences and the risk of overlapping speech. To formulate this
as a Reinforcement Learning problem, we need to define states and
actions for our scenario.
Using single-step Q-Learning, the feature combination in the
prosody preceding the current silence becomes the state and the length
of the STW becomes the action to be learned. For efficiency, we have
split the continuous action space into discrete logarithmic values
(see Table 1), starting with 10 msecs and doubling the value up to
1.28 seconds (the maximum STW where the system takes the turn by
default). The action selection policy for OGTD-2 is G-greedy with 10%
exploration, always selecting the shorter STW if two or more actions
share the top spot.
The reward given for decisions that do not lead to overlapping
speech (i.e. successful transitions) is the milliseconds in the selected
STW; a 100 msec STW will receive a reward of -100 if successful and
STW of 10 msecs will receive -10 points. If, however, overlapping
speech results from the decision (i.e. the action is unsuccessful), a fixed
reward of -2000 (i.e. more than waiting the maximum amount of time)
is given. This is to simulate that when two STWs are without overlap,
the smaller is better. Every reward in the learning system is negative,
resulting in unexplored actions being the best option at each time,
since return starts at 0.0 for unexplored actions, and once a reward has
been given the return can only decrease. In the beginning, the agent
Table 1.
Discrete actions representing STW size in msecs.
Action (STW)
Reward: Successful transition
Reward: Unsuccessful transition
10
-10
-2000
20
-20
-2000
40
-40
-2000
80
-80
-2000
160
-160
-2000
320
-320
-2000
640
-640
-2000
1280
-1280
-2000
Search WWH ::




Custom Search