Q-learning Algorithm for Task Allocation Based on Social Relation - Process-Aware Systems

Information Technology Reference

In-Depth Information

4.2 Action State

Let A be a set of actions, A=R, where, R is a set of the resources in the workflow

model. Let a ( s )= candidates ( s.t ) be a set of the optional actions in the state s ,

where s.t means the task t is allocated in state s .

4.3 Reward Function

In this paper, the objective is minimizing the flow time of the business process

cases. So the reward function should be defined as:

r =

1

˃.fl

, ( tisNone )

(3)

0 ,

( else )

Where, t is a task will be allocated, ˃ is the current determinative case, ˃.fl is

the flow time of the case. The shorter flow time ˃.fl takes, the higher immediate

payoff r returns. The agent will receive the reward when the last work item is

completed, otherwise the reward is zero.

5 Q-learning Algorithm for MDPs

Q-learning is a more suitable algorithm to solve the problem modeled by MDPs

[15]. In this section, the detail of the proposed Q-learning algorithm will be de-

scribed. In order to verify the influence of the previous resources on the candidate

resources, the Q-learning algorithm without SR is used as the basic experiment.

The basic experiment will be described in the first part, and the second part will

introduce the Q-learning algorithm with SR.

5.1 The Q-learning Algorithm without SR

In order to realize the Q-learning algorithm, the evaluation function Q ( s, a )

and the estimation function Q ( s, a ) was defined [15]. According to MDPs, the

evaluation function Q ( s, a ) can be written as:

( r ( s, a )+ ʳmax a Q ( ʴ ( s, a ) ,a )

Q ( s, a )

≡

(4)

We can use Q to evaluate the Q function, then the agent can update Q ( s, a )

according to the following rules:

Q ( s, a )

r + ʳmax a Q ( s ,a )

ₐ

(5)

For the situation of non deterministic reward and action, formula (5) will be

replaced by the following formula:

Q n ( s, a )

ʱ n ) Q n− 1 ( s, a )+ ʱ n [ r + ʳmax a Q n− 1 ( s ,a )]

ₐ

(1

−

(6)

Where,

1

1+ visits n ( s, a )

ʱ n =

Process-Aware Systems

Search WWH ::

Custom Search

Home