Concurrent Modular Q-Learning with Local Rewards on Linked Multi-Component Robotic Systems - Foundations on Natural and Artificial Computation

Information Technology Reference

In-Depth Information

S ,the i th module is

agent can take. For a discretely perceived world state s

∈

fedwithasubset s i ⊂

s which is expected to be relevant to achieve its goal. For

the sake of simplicity, all notation will refer to structures and functions existing

in all agents and superscripts indicating the agent index will not be added. Let

us denote as Q i (

the Q matrix entry of the i th module relating its partial

state s i and action a . Termination conditions are defined as k subsets S i S where

s i ,a

)

∪

j =1

, ..., k , satisfying

S j =

S t

⊂

S .

2.1 Reward and Termination Functions

Most monolithic Q-Learning systems involve the construction of modeling func-

tions that can be expressed as a series of IF-rules: A reward function r

→ R

and a termination function t

→{

true, f alse

}

⎧

⎨

⎧

⎨

if sS t

r 1

t 1

...

(

⎩

if sS k

r k

t k

r 0

else

false

else

∀

i =1

∀

i =1

where

,and r 0 represents the selected neutral reward.

The modular approach decomposes these functions into m reward functions and

m termination functions:

r i

t i

{

true, f alse

}

r 1

t 1

if s i S i

r i (

,t i (

r 0

else

false

else

Fig. 1. Modular Q-Learning Scheme

The typical approach to learn different sub-tasks or behaviors concurrently

involves using a

in the liter-

ature) responsible for action selection, as represented in Figure 1. Each module

has its own Q matrix representing its partial knowledge of the world state s i and

modules may even compete imposing their preferences to the rest. To select the

next action we will follow the Greatest Mass (GM) strategy [12] defined as:

Module Mediator

(also referred to as

Module Arbiter

Foundations on Natural and Artificial Computation

Search WWH ::

Custom Search

Home