Optimization in Brain? – Modeling Human Behavior and Brain Activation Patterns with Queuing Network and Reinforcement Learning Algorithms - Computational Neuroscience

Information Technology Reference

In-Depth Information

number of entities processed by server BG which is implemented as a matrix of

frequency recorded in LTPM server.

For the Hicog and PM servers, to avoid building an ad hoc model and using the

result of the experiment to be simulated directly, nine parameters in the Hicog and

the PM servers were calculated based on previous studies (see Appendix 1).

9.3.1.2 Learning Process in the Simplest Queuing Network with two Routes

Based on the learning process of individual servers, the condition under which an

entity switches between the two routes in the simplest form of queuing networks

with two routes (each capacity equals 1) (from route 1

...

4toroute1

...

see Fig. 9.6) was quantified and proved by the following mathematical deduction.

1. Q online learning equation [46]

Q t + 1

Q t

(

)

(

)+ ε {

r t + γ

max

[

(

) −

(

)] ,

(9.8)

where Q t + 1

(

)

is the online Q value if entity routes from server i to server j

in t

1th transition; max k [

(

)]

represents maximum Q value routing from

1); r t = μ j , t is the reward and is the pro-

cessing speed of the server j if entity enters it at t th transition; N jt represents

number of entities go to server j at t th transition;

server j to the next k server(s) ( k

≤

is the learning rate of Q on-

line learning (0

< ε <

1);

is the discount parameter of routing to next server

1); and p is the probability of entity routes from server 1 to server 3

does not follow the Q online learning rule if Q

< γ <

(

) >

(

)

. For example,

if p

1, then 10% of entity will go from server 1 to server 2 even though

State is the status that an entity is in server i ; transition is defined as an entity

routed from server i to j . Equation (9.8) updates a Q value of a backup choice

of routes ( Q ( t + 1 ) (

(

) >

(

)

) based on the Q value which maximizes over all those

routes possible in the next state (max k [

(

)]

). In each transition, entities will

choose the next server according to the updated Q t

(

)

.If Q

(

) >

(

)

more entity will go from server 1 to server 3 rather than go to server 2.

2. Assumption

• ε

is a constant which does not change in the current learning process (0

ε <

1) .

μ 4 ) is constant.

3. Lemma 9.1. At any transition state t ( t

•

Processing speed of server 4 (

/ μ 3 , t then Q t + 1

0), if 1

/ μ 2 , t <

(

) >

Q t + 1

)

Proof of Lemma 9.1 (see Appendix 2).

Based on Lemma 9.1 and Equation (9.7), we got Lemma 9.2:

4. Lemma 9.2. At any transition state t ( t

(

0), if A 2 +

B 2 Exp

( α 2 N 2 t ) <

A 3 +

then Q t + 1

Q t + 1

B 3 Exp

( − α 3 N 3 t )

(

) >

(

)

Computational Neuroscience

Search WWH ::

Custom Search

Home