Social Welfare for Automatic Innovation - Multiagent System Technologies

Information Technology Reference

In-Depth Information

As said before, R Σ resembles the OCC “satisfaction / fears-confirmed” emo-

tion pair. The resemblance goes into the semantic level as well as into the vari-

ables needed to the emotional response. This resemblance induces the definition

of

R Σ

S

R Σ

S

.Thiswaythe

personality, which is invariant to an agent, includes how the agent exposes an

opinion, without the need for a different function. The model (section 3.2) had

R Σ apart from the personality in case a different function seems more convenient

for a different purpose.

The total social opinion will be used by κ function to compute the KAI index.

The total social welfare -noted as σ for short- is computed as:

σ =

A i

to be the same as the definition of

(eq. 1):

=

R Σ A i

(2)

for every agent A i , including itself.

For the KAI index calculation, carried out by the κ function, we propose a

time sequence in which both the last value of κ and the total social welfare σ

from eq. 2 are used:

k t

k t +2 k t ( σ− 2)

κ ≡ k t +1 =

(3)

The sequence is similar to a sigmoid when centered around σ = 0, but converges

differently (see figure 2(b) for the κ sequence with t = 500). κ is not very sensitive

to the initial value k 0 ,aslongas k 0 ∈

]0 , 1] (with k 0 = 0 it holds that k i =0 , ∀i ,

regardless of the values of σ ).

4.1 Convergence of Learning

Convergence of the learning process is proven under some assumptions:

- The agent represents the universe without taking into account other agents.

At least the agent has a function of representation which can represent the

universe with no ambiguity with a high probability that increases with time.

- The agent knows the set of actions available at time t .

- The agent observes the response of the system to its actions.

To prove that the algorithm converges, the k t sequence (eq. 3) has to converge

as well -necessary condition. If it is so, the Q-learning algorithm will behave as

usual, thus converging to the optimal policies (under the former assumptions) if

all the states are visited enough. It is proven in [15] that:

W ( ln (2)2 σ− 2 ( σ −

2))

k→∞ k t =1

lim

−

(4)

( σ −

2) ln (2)

In (4) the W ( x ) stands for the Lambert-W function such that for every number

x ∈ R

, x = W ( x ) e W ( x ) .

The proof of the convergence of the learning algorithm follows intuitively

from the proof of Q-learning convergence [16]: The agents will learn a policy

Multiagent System Technologies

Search WWH ::

Custom Search

Home