Social Welfare for Automatic Innovation - Multiagent System Technologies

Information Technology Reference

In-Depth Information

Although the agent is learning, it cannot distinguish between what hasn't been

learned yet and what has changed. It is desirable to stop learning (or to learn at

a slower pace) when the agent has a good approximation for the real functions.

The probability of exploration ( k ) can help if it is a function of the uncertainty

about these functions.

Following the previous model (in section 3), the KAI index will represent

k , and it has to be computed from the emotional response of each agent. The

emotional response will have to include means to feel (compute) the uncertainty

of the learned functions.

Taking into account, also from the model, that the agent should give a negative

opinion when its perception of the situation is bad -and positive when it is good -

, the emotional response will also have a way to express these outcomes when

executing actions in the environment. To fulfill these two requirements, two of

the OCC emotions will be directly linked to the computation of the KAI index

k : the joy/distress of the agent, noted by

, and the hope/fear, noted by

The functions given for

will define a pessimistic agent. I.e. the agent

will feel distress more easily than joy, when changes occur. That will enable the

agent to move faster into different policies, some of which could improve the

situation -and thus make the agent feel joy-. The disadvantage is that the agent

will need more time to converge to a stable policy. In human terms, the agent's

behavior could be identified as being cautious .

The functions which compute

and

are shown below:

1 Q t−p

Q t +1 − Q t

Q t +1 − M p

2max(

;

M p =

|Q t +1 |, |Q t |

|Q t +1 |, |M p |

max(

Notation: Q t means the discounted reward the agent 1 has learned so far at

timestep t for the given state and the taken action, such as Q t ≡ Q t ( s t ,a t ). The

normalization is done by the maximum in the pair Q t +1 ,Q t ;then

J∈

]

−

1 , +1[.

M p is the average of the last p steps.

1 , +1[ is the agent's hope to improve

its performance, remembering only p last values. →

H∈

]

−

0, to avoid division by 0

The next step is to define the social opinion function. The agent will give its

opinion based on the satisfaction it experiences, so it seems logical to describe

a function for the representation of the OCC emotion “satisfaction” and use its

value as the opinion as well. First, lets define a prospective happiness value as

and

= JH

. This definition of Ψ

allows an agent to give a positive opinion

when it feels joy and feels no fear (

J→

1 ,H→

1). The agent will give a negative

opinion when it feels distress and has fear (

J→−

1 ,H→−

1). The limit values

are shown in table 1 (also, in a static environment, when

t →∞

,itis

expected to find

J→

0and

H→

0, by its definitions).

1 Subscripts have been omitted for the sake of clarity; e.g. when it is said Q ( s, a )it

really means Q ( s, a ) A i , for the agent A i . It holds for every parameter in the section.

Multiagent System Technologies

Search WWH ::

Custom Search

Home