Information Technology Reference

In-Depth Information

2.2 Reinforcement Learning Algorithm

The reinforcement learning is an algorithm which agent makes decision based

on the feedback through the interacting with the current environment state.

The reinforcement leaning algorithm is divided into two types according to the

optimizing index of MDPs [10]. One is the discounted return index reinforce-

ment learning, the other is the average reward reinforcement learning.
TD
(
ʻ
),

Sarsa learning and Q-learning is the first type. R learning and H learning is

the second type. The reinforcement learning is widely used to solve problems

in different fields. For example, R-learning was used to study parallel machines

scheduling problems which aimed to minimize the flow time of jobs [11]. It is

the average reward reinforcement learning so that it is not suitable to solve the

workflow problems. Mehmet Emin Aydin et al. tried to used Q-learning to solve

the job-shop scheduling problem [12]. Zhengxing Huang et al. used Q-learning

to address work distribution problems of business process management [13][14].

These papers show that Q-learning has the ability to solve the workflow problems

modeled by MDPs.

3 Concepts and Definitions

3.1 Social Relation between Two Resources

Before computing the influence of the previous resources on the current candi-

date resources, the social relation factor between each tow resources should be

calculated first. The social relation between
r
2
and
r
1
can be defined as follow:

SR
r
2
,r
1
=
t
r
2
,r
1
−

t

(1)

t

Here
t
r
2
,r
1
is the processing time of the resource
r
2
when collaborated with

r
1
,and
t
is average processing time of the resource
r
2
related to a certain task.

Obviously, if
r
1
and
r
2
collaborates well,
SR
r
2
,r
1
will be a negative number.

It means that it will cost shorter time when
r
2
collaborate with
r
1
.If
SR
r
2
,r
1
is

a positive number, the collaborated resources fail to promote each other.

3.2 The Influence of the Previous Resources

According to formula (1), the processing time
t
r
3
,r
2
of the resource
r
3
when

collaborated with
r
2
or
r
1
is

t
r
3
,r
2
=(1+
SR
r
3
,r
2
)

∗

t

t
r
3
,r
1
=(1+
SR
r
3
,r
1
)

∗

t

So the processing time
t
r
3
,r
2
,r
1
of the resources
r
3
when collaborated with
r
2
and

r
1
is

t
r
3
,r
2
,r
1
=
t
r
3
,r
2
+
t
r
3
,r
1

2

=(1+
SR
r
3
,r
2
+
SR
r
3
,r
1

2

)

∗

t