Information Technology Reference
2.2 Reinforcement Learning Algorithm
The reinforcement learning is an algorithm which agent makes decision based
on the feedback through the interacting with the current environment state.
The reinforcement leaning algorithm is divided into two types according to the
optimizing index of MDPs . One is the discounted return index reinforce-
ment learning, the other is the average reward reinforcement learning. TD ( ʻ ),
Sarsa learning and Q-learning is the first type. R learning and H learning is
the second type. The reinforcement learning is widely used to solve problems
in different fields. For example, R-learning was used to study parallel machines
scheduling problems which aimed to minimize the flow time of jobs . It is
the average reward reinforcement learning so that it is not suitable to solve the
workflow problems. Mehmet Emin Aydin et al. tried to used Q-learning to solve
the job-shop scheduling problem . Zhengxing Huang et al. used Q-learning
to address work distribution problems of business process management .
These papers show that Q-learning has the ability to solve the workflow problems
modeled by MDPs.
3 Concepts and Definitions
3.1 Social Relation between Two Resources
Before computing the influence of the previous resources on the current candi-
date resources, the social relation factor between each tow resources should be
calculated first. The social relation between r 2 and r 1 can be defined as follow:
SR r 2 ,r 1 = t r 2 ,r 1 −
Here t r 2 ,r 1 is the processing time of the resource r 2 when collaborated with
r 1 ,and t is average processing time of the resource r 2 related to a certain task.
Obviously, if r 1 and r 2 collaborates well, SR r 2 ,r 1 will be a negative number.
It means that it will cost shorter time when r 2 collaborate with r 1 .If SR r 2 ,r 1 is
a positive number, the collaborated resources fail to promote each other.
3.2 The Influence of the Previous Resources
According to formula (1), the processing time t r 3 ,r 2 of the resource r 3 when
collaborated with r 2 or r 1 is
t r 3 ,r 2 =(1+ SR r 3 ,r 2 )
t r 3 ,r 1 =(1+ SR r 3 ,r 1 )
So the processing time t r 3 ,r 2 ,r 1 of the resources r 3 when collaborated with r 2 and
r 1 is
t r 3 ,r 2 ,r 1 = t r 3 ,r 2 + t r 3 ,r 1
=(1+ SR r 3 ,r 2 + SR r 3 ,r 1