Information Technology Reference
In-Depth Information
As said before,
R
Σ
resembles the OCC “satisfaction / fears-confirmed” emo-
tion pair. The resemblance goes into the semantic level as well as into the vari-
ables needed to the emotional response. This resemblance induces the definition
of
R
Σ
S
R
Σ
S
.Thiswaythe
personality, which is invariant to an agent, includes how the agent exposes an
opinion, without the need for a different function. The model (section 3.2) had
R
Σ
apart from the personality in case a different function seems more convenient
for a different purpose.
The total social opinion will be used by
κ
function to compute the KAI index.
The total social welfare -noted as
σ
for short- is computed as:
σ
=
A
i
to be the same as the definition of
(eq. 1):
=
R
Σ
A
i
(2)
for every agent
A
i
, including itself.
For the KAI index calculation, carried out by the
κ
function, we propose a
time sequence in which both the last value of
κ
and the total social welfare
σ
from eq. 2 are used:
k
t
k
t
+2
k
t
(
σ−
2)
κ ≡ k
t
+1
=
(3)
The sequence is similar to a sigmoid when centered around
σ
= 0, but converges
differently (see figure 2(b) for the
κ
sequence with
t
= 500).
κ
is not very sensitive
to the initial value
k
0
,aslongas
k
0
∈
]0
,
1] (with
k
0
= 0 it holds that
k
i
=0
, ∀i
,
regardless of the values of
σ
).
4.1 Convergence of Learning
Convergence of the learning process is proven under some assumptions:
-
The agent represents the universe without taking into account other agents.
At least the agent has a function of representation which can represent the
universe with no ambiguity with a high probability that increases with time.
-
The agent knows the set of actions available at time
t
.
-
The agent observes the response of the system to its actions.
To prove that the algorithm converges, the
k
t
sequence (eq. 3) has to converge
as well -necessary condition. If it is so, the Q-learning algorithm will behave as
usual, thus converging to the optimal policies (under the former assumptions) if
all the states are visited enough. It is proven in [15] that:
W
(
ln
(2)2
σ−
2
(
σ −
2))
k→∞
k
t
=1
lim
−
(4)
(
σ −
2)
ln
(2)
In (4) the
W
(
x
) stands for the Lambert-W function such that for every number
x ∈
R
,
x
=
W
(
x
)
e
W
(
x
)
.
The proof of the convergence of the learning algorithm follows intuitively
from the proof of Q-learning convergence [16]: The agents will learn a policy