Social Welfare for Automatic Innovation - Multiagent System Technologies

Information Technology Reference

In-Depth Information

technique for grouping states (Soft State Aggregation, or SSA) which allow them

to represent an infinite space of continuous values, with D dimensions, in a

D -dimensional finite space (clusters) [17]. For the study case the environment

state space is seen as a

D -dimensional matrix, grouping the states (

∈ R D )in

i

an exponential way. The discrete position (or index of the cluster)

for the

continuous variable x is computed as:

i =

min( log 2 ( x +1) ,ω )

Bounding i on some number ( ω ) which is to be considered near to the maximum

that is to be seen for that variable, for D dimensions that are taken into account

for the environment representation. Therefore the total set of states which every

agent must represent is ω D ×|A|

. The number of actions

|A|

is known by every

agent (although it may change with time).

Two different types of agents will be differentiated; those who can receive and

understand social opinions -executing Social-Welfare RL- and those which not

-using standard Q-learning.

Learning of the Q -table was done individually for both the social-aware agents

and the non-social. Parameter ω was fixed at 5, which means that the last state

for each dimension will represent values of

x ∈

[15 , +

∞

[. The dimensions (or

variables) taken into account to represent the environment are the number of

products of each kind in the market plus agent's life and balance .

Three types of products where loaded: Wheat , with no dependencies and need-

ing 1 cycle. Flour , requiring 2 units of flour and 2 cycles. And Bread , eatable,

providing 10 units of life, requiring 2 units of flour and 2 cycles.

The actions available to the agents are the creation of any kind of the products,

plus another one called eating . The total space of representation needed by each

agent is only of (5) (3+2)

4 = 12500 states. Learning stage has taken 1 , 000 , 000

cycles, with a probability of exploration k = 1, hopping to explore as many

states as possible, as many times they could. Every agent starts with exactly

the same Q -matrix at the beginning of the simulation. The experiment includes

three different scenarios, thus there are two significant changes. The scenarios

are:

×

1. No changes in the environment.

2. Relaxing the production rules: Flour needs 1 unit of time and 1 unit of

Wheat ; Bread needs 1 unit of time and 1 unit of Flour .

3. Hardening the production rules: Wheat needs 2 units of time; Flour needs

2 units of time and 3 units of Wheat ; Bread stays as originally, needing 2

units of time and 2 units of Flour .

As a remark, it is important to control the value of κ such that never reaches

0. For that matter, the computation of

κ

is corrected by a

→

0whichis

representable by a machine so that ≤ κ ≤

1.

It is expected to observe a better adaptation of the social-aware agents com-

pared to the non-social ones which use traditional reinforcement learning (SSA

Q-learning).

Multiagent System Technologies

Search WWH ::

Custom Search

Home