Information Technology Reference
In-Depth Information
As said before, R Σ resembles the OCC “satisfaction / fears-confirmed” emo-
tion pair. The resemblance goes into the semantic level as well as into the vari-
ables needed to the emotional response. This resemblance induces the definition
of
R Σ
S
R Σ
S
.Thiswaythe
personality, which is invariant to an agent, includes how the agent exposes an
opinion, without the need for a different function. The model (section 3.2) had
R Σ apart from the personality in case a different function seems more convenient
for a different purpose.
The total social opinion will be used by κ function to compute the KAI index.
The total social welfare -noted as σ for short- is computed as:
σ =
A i
to be the same as the definition of
(eq. 1):
=
R Σ A i
(2)
for every agent A i , including itself.
For the KAI index calculation, carried out by the κ function, we propose a
time sequence in which both the last value of κ and the total social welfare σ
from eq. 2 are used:
k t
k t +2 k t ( σ− 2)
κ ≡ k t +1 =
(3)
The sequence is similar to a sigmoid when centered around σ = 0, but converges
differently (see figure 2(b) for the κ sequence with t = 500). κ is not very sensitive
to the initial value k 0 ,aslongas k 0
]0 , 1] (with k 0 = 0 it holds that k i =0 , ∀i ,
regardless of the values of σ ).
4.1 Convergence of Learning
Convergence of the learning process is proven under some assumptions:
- The agent represents the universe without taking into account other agents.
At least the agent has a function of representation which can represent the
universe with no ambiguity with a high probability that increases with time.
- The agent knows the set of actions available at time t .
- The agent observes the response of the system to its actions.
To prove that the algorithm converges, the k t sequence (eq. 3) has to converge
as well -necessary condition. If it is so, the Q-learning algorithm will behave as
usual, thus converging to the optimal policies (under the former assumptions) if
all the states are visited enough. It is proven in [15] that:
W ( ln (2)2 σ− 2 ( σ −
2))
k→∞ k t =1
lim
(4)
( σ −
2) ln (2)
In (4) the W ( x ) stands for the Lambert-W function such that for every number
x ∈ R
, x = W ( x ) e W ( x ) .
The proof of the convergence of the learning algorithm follows intuitively
from the proof of Q-learning convergence [16]: The agents will learn a policy
Search WWH ::




Custom Search