Optimization in Brain? – Modeling Human Behavior and Brain Activation Patterns with Queuing Network and Reinforcement Learning Algorithms - Computational Neuroscience

Information Technology Reference

In-Depth Information

mathematic operation tasks better than the power law [19] and has been applied in

modeling long-term memory retrieval [1] we used it to model our individual server

learning processes:

/ μ BG =

A BG +

B BG exp

− α BG N BG ,

(9.1)

/ μ BG : motor program retrieving time; A BG : the minimal of processing time of

BG server after practice (314 ms, [35]); B BG : the change of expected value of pro-

cessing time from the beginning to the end of practice (2

628 ms, assumed).

α BG : the learning rate of server BG (0.00142, [18]); N BG : number of digraphs (letter

pairs excluding the space key) processed by server BG, which is implemented as a

matrix of diagraph frequency recorded in LTPM server.

314

Self-Organization of the Queuing Network

If the entities traversing the network try to maximize their information processing

speed and minimize error, it is appropriate to apply reinforcement learning algo-

rithms to quantify this dynamic process. Reinforcement learning is a computational

approach able to quantify how an agent tries to maximize the total amount of reward

it receives in interacting with a complex, uncertain environment [46]. Reinforcement

learning has also been applied in modeling motor learning in neuroscience [33] and,

therefore, may be appropriately applied to model brain network organization. To in-

tegrate the reinforcement learning algorithms with the queuing network approach,

it is necessary to define the state, transitions, and reward values of reinforcement

learning with the concepts of queuing networks. Below are the definitions:

1. State: the status that an entity is in server i .

2. Transition: An entity routed from server i to j .

3. Time-saving reward (r t ): r t =(

w q )+ μ j , t (2)

w q : time the entity spent waiting in the queuing of the server;

μ j , t : processing

speed of the entity at that server.

4. Error-saving reward (r t ) : r

/ (

N error j , t +

)

(3)

N error j , t : number of action errors of the previous entities made in the next server

j at t th transition. Q online learning algorithms in reinforcement learning are used

to quantify the processes that are used by entities to choose different routes based

on rewards of different routes.

1. Q online learning algorithm of time-saving reward

Q t + T Q t T (

r t + γ

Q t T (

)+ ε {

max

[

)] −

) },

(9.2)

: learning rate of Q online learning (0

< ε <

ε =

99);

: discount parameter of routing to next server (0

< γ <

, γ =

3);

Q t T +

(

)

: online Q value if entity routes from server i to server j in t

1th

transition based on time-saving reward;

Computational Neuroscience

Search WWH ::

Custom Search

Home