Information Technology Reference
In-Depth Information
visits
n
(
s, a
) is the accessed total number of the state-action pair (s,a) within n
loops.
And in order to facilitate the iterative approximation of
Q
(
s, a
), one state-
action pair (s,a) is mapped to a
Q
value. In this paper, there is:
(
s, a
)=((
t, wl
)
,a
)=(
t, wl, a
)
ↆ
T
×
Wl
×
R
Where,
T
and
R
are finite sets. Let
t
∈
T
is the task will be allocated. The
workload of one resource
r
∈
candidates
(
t
) is defined as
wl
(
r
):
⊧
⊨
FREE,
(
|
r.wl
|
=0)
wl
(
r
)=
LOW,
(
|
r.wl
|≤
avg wl
(
t
))
(7)
⊩
HIGH,
(
|
r.wl
|
>avgwl
(
t
))
Here, the function
avg wl
(
t
) is the average length of all candidate resources'
work list for the task
t
. For example, if there are no enabled work items that are
allocated to the resource
r
,
wl
(
r
)=
FREE
, and if the work list's length of the
resource
r
is less (higher) than the
avg wl
(
t
),
wl
(
r
)=
LOW
(
HIGH
).
Intuitively, the estimation function
Q
(
s, a
) can be mapped to a big table:
Q
=
Q Table
(
t, r, wl
(
r
))
For instance,
Q Table
(
t
1
,r
1
,FREE
) is a value achieved from formula (6) when
task
t
1 is allocated to resource
r
1 and its workload is
FREE
.
Table1 shows the main steps of Q-learning algorithm for task allocation with-
out SR as follow:
Tabl e 1.
Q-learning Algorithm without SR
1Foreach
s
and
a
,the
Q Table
value is initialized as 0.
2 Get the current state
s
3 Repeat until the system stop:
4
Choose the resource
r
pick
according to the value of
ˆ
Q
(
s,a
)
,a← r
pick
,and
then allocate the task
t
to
r
pick
5
Allocate processing time for the task
6
Save (
s,a
) according to the corresponding case
7
A new enabled item is came or a work item is completed.
Get a new state
s
8
ead (
s,a
) according to the new case
˃
9
10
Receive the reward
r
11
Update the
Q Table
value according to the following formula:
Q
n
(
s,a
)
←
(1
− ʱ
n
)
ˆ
ˆ
Q
n−
1
(
s,a
)+
ʱ
n
[
r
+
ʳmax
a
ˆ
Q
n−
1
(
s
,a
)]
12
s ← s
,
˃ ← ˃
13