Information Technology Reference
In-Depth Information
Q t T (
max k [
j
,
k
)]
: maximum Q value routing from server j to the next k server(s)
( k
1).
Equation (9.2) updates a Q value of a backup choice of routes ( Q t T (
) based
on the Q value which maximizes over all those routes possible in the next state
(max k [
i
,
j
)
). In each transition, entities will choose the next server ac-
cording to the updated Q t T (
QtT
(
j
,
k
)]
.
2. Q online learning algorithm of error-saving reward
i
,
j
)
Q t + E Q t E (
r
t
Q t E (
Q t E (
i
,
j
)+ ε {
+ γ
max
k
[
j
,
k
)]
i
,
j
) }.
(9.3)
3. Trade-off of the two Q values
The choice of routes is determined by the trade-off between the two Q values. Cur-
rently, it is assumed that Q t + 1
E
(
i
,
j
)
of error-saving reward has the higher priority
than the Q t + 1
T
of time-saving reward: if Q t + 1
E
Q t + 1
E
(
,
)
(
,
) >
(
,
)
i
j
i
j
i
k
, the entity will
choose the next server j whatever the value of Q t + 1
T
;if Q t + 1
E
Q t + 1
E
(
,
)
(
,
)=
(
,
)
i
j
i
j
i
k
,
entity will choose the next server with greater Q t + 1
T
;if Q t + 1
E
Q t + 1
E
(
i
,
j
)=
(
i
,
k
)
and
Q t + 1
T
Q t + 1
T
, entity will choose next server randomly. With these equa-
tions, we were able to successfully integrate queuing networks with reinforcement
learning algorithms.
(
i
,
j
)=
(
i
,
k
)
9.2.4 Model Predictions of three Skill Learning Phenomena
and two Brain Imaging Phenomena
The three skill learning phenomena and the two brain imaging phenomena of tran-
scription typing described earlier in this chapter can be predicted by the queuing
network model with reinforcement learning.
9.2.4.1 Predictions of the three Skill Learning Phenomena
We assume that the processing times of the CE, BG, and SMA servers follow the
exponential distribution (see Table 9.1 and Fig. 9.1) and are independent from one
another. Therefore, if Y 1
Y k are k independent exponential random variables rep-
resenting the processing times of the servers in our network, their sum X follows an
Erlang distribution. Based on features of Erlang distributions, we have
···
k
i = 1 Y i ,
X
=
(9.4)
E k
i = 1 Y i
k
i = 1 E [ Y i ]= k 1
E
[
X
]=
=
λ ,
(9.5)
Search WWH ::




Custom Search