Information Technology Reference
In-Depth Information
mathematic operation tasks better than the power law [19] and has been applied in
modeling long-term memory retrieval [1] we used it to model our individual server
learning processes:
1
/ μ BG =
A BG +
B BG exp
α BG N BG ,
(9.1)
/ μ BG : motor program retrieving time; A BG : the minimal of processing time of
BG server after practice (314 ms, [35]); B BG : the change of expected value of pro-
cessing time from the beginning to the end of practice (2
1
628 ms, assumed).
α BG : the learning rate of server BG (0.00142, [18]); N BG : number of digraphs (letter
pairs excluding the space key) processed by server BG, which is implemented as a
matrix of diagraph frequency recorded in LTPM server.
×
314
=
Self-Organization of the Queuing Network
If the entities traversing the network try to maximize their information processing
speed and minimize error, it is appropriate to apply reinforcement learning algo-
rithms to quantify this dynamic process. Reinforcement learning is a computational
approach able to quantify how an agent tries to maximize the total amount of reward
it receives in interacting with a complex, uncertain environment [46]. Reinforcement
learning has also been applied in modeling motor learning in neuroscience [33] and,
therefore, may be appropriately applied to model brain network organization. To in-
tegrate the reinforcement learning algorithms with the queuing network approach,
it is necessary to define the state, transitions, and reward values of reinforcement
learning with the concepts of queuing networks. Below are the definitions:
1. State: the status that an entity is in server i .
2. Transition: An entity routed from server i to j .
3. Time-saving reward (r t ): r t =(
w q )+ μ j , t (2)
w q : time the entity spent waiting in the queuing of the server;
1
/
μ j , t : processing
speed of the entity at that server.
4. Error-saving reward (r t ) : r
=
1
/ (
N error j , t +
1
)
(3)
t
N error j , t : number of action errors of the previous entities made in the next server
j at t th transition. Q online learning algorithms in reinforcement learning are used
to quantify the processes that are used by entities to choose different routes based
on rewards of different routes.
1. Q online learning algorithm of time-saving reward
Q t + T Q t T (
r t + γ
Q t T (
Q t T (
i
,
j
)+ ε {
max
k
[
j
,
k
)]
i
,
j
) },
(9.2)
ε
: learning rate of Q online learning (0
< ε <
1,
ε =
0
.
99);
γ
: discount parameter of routing to next server (0
< γ <
1
, γ =
0
.
3);
Q t T +
1
(
i
,
j
)
: online Q value if entity routes from server i to server j in t
+
1th
transition based on time-saving reward;
Search WWH ::




Custom Search