Information Technology Reference
visits n ( s, a ) is the accessed total number of the state-action pair (s,a) within n
And in order to facilitate the iterative approximation of Q ( s, a ), one state-
action pair (s,a) is mapped to a Q value. In this paper, there is:
( s, a )=(( t, wl ) ,a )=( t, wl, a )
Where, T and R are finite sets. Let t
T is the task will be allocated. The
workload of one resource r
candidates ( t ) is defined as wl ( r ):
wl ( r )=
avg wl ( t ))
>avgwl ( t ))
Here, the function avg wl ( t ) is the average length of all candidate resources'
work list for the task t . For example, if there are no enabled work items that are
allocated to the resource r , wl ( r )= FREE , and if the work list's length of the
resource r is less (higher) than the avg wl ( t ), wl ( r )= LOW ( HIGH ).
Intuitively, the estimation function Q ( s, a ) can be mapped to a big table:
Q = Q Table ( t, r, wl ( r ))
For instance, Q Table ( t 1 ,r 1 ,FREE ) is a value achieved from formula (6) when
task t 1 is allocated to resource r 1 and its workload is FREE .
Table1 shows the main steps of Q-learning algorithm for task allocation with-
out SR as follow:
Tabl e 1. Q-learning Algorithm without SR
1Foreach s and a ,the Q Table value is initialized as 0.
2 Get the current state s
3 Repeat until the system stop:
Choose the resource r pick according to the value of ˆ
Q ( s,a ) ,a← r pick ,and
then allocate the task t to r pick
Allocate processing time for the task
Save ( s,a ) according to the corresponding case
A new enabled item is came or a work item is completed.
Get a new state s
ead ( s,a ) according to the new case ˃
Receive the reward r
Update the Q Table value according to the following formula:
Q n ( s,a ) ← (1 − ʱ n ) ˆ
Q n− 1 ( s,a )+ ʱ n [ r + ʳmax a ˆ
Q n− 1 ( s ,a )]
s ← s , ˃ ← ˃