Information Technology Reference
In-Depth Information
After a turn, she updates her propensities as
w
i,k
(
t
+1)=(1
−
φ
a
)
w
i,k
(
t
)+1
{s
i
,s
i
(
t
)
}
(1
−
φ
a
)
R
where
R
is a digital payoff,
φ
a
is positive constants called learning parameter,
and
s
i
(
t
) is player
i
's actually chosen strategy at
t
.
-
QFP agents
Cheung and Friedman propose a generalized belief-based learning model
(usually called
weighted fictitious play
) in which players firstly expect what
the others will do based on their own prior beliefs, which are usually ratios
of the number of submitted plays to the whole moves [6].
Let
L
i
(
t
) be the total possible counts of past plays for player
i
(
i
=
1
,
,N
)and
B
−i
(
t
) be her belief about her opponents will submit
s
−i
=(
s
k
1
,
···
,s
k
i−
1
i−
1
,s
k
i
+1
,s
k
N
)where
s
−i
is a vector of the other
player(s)' submission
s.t.
s
−i
∈
S
=
−i
{
1
, ··· ,M}
with
s
i
being player
i
's
k
-th strategy and
···
i
+1
,
···
s
−i
(
t
) is the strategy vector probably chosen at turn
t
s.t.
s
−i
(
t
)=(
s
1
(
t
)
,
···
,s
i−
1
(
t
)
,s
i
+1
(
t
)
,
···
,s
N
(
t
)). Then we write them
as
L
i
(
t
)=
s
∈
S
L
−i
(
t
)and
B
−i
(
t
)=
L
s
−i
(
t
)
respectively, with
L
−i
(
t
)
≥
0
L
(
t
)
and
L
(
t
)
0. Note that the possible combination of other player(s)' submis-
sion is obtained by
s
i
(
t
) and winning integer
v
(
t
). For instance, when player
i
submits 1 and the winning integer is 0 (no winner) in (
N,M
)=(3
,
≥
·
)DIY-
L, she can imagine that the others also choose 1. Or, when player
i
submits
1 and the winning integer is 1 (player
i
wins) in (
N,M
)=(3
,
3) DIY-L,
she will expect that the possible submissions are
{
2
,
2
} w.p.
1/4,
{
2
,
3
} w.p.
1/2, and
w.p.
1/4. Indeed, the numbers of combinations for each set
ups are 6 in (
N,M
)=(3
,
3) DIY-L, 10 in (
N,M
)=(3
,
4)
,
(4
,
3), and 20 in
(
N,M
)=(4
,
4) respectively.
After a turn, their beliefs are updated as
{
3
,
3
}
L
−i
(
t
(1
−
φ
f
)
·
−
1) +
φ
f
·
G
(
s
i
(
t
)
,v
(
t
)
,
s
−i
,
s
−i
(
t
))
B
−i
(
t
)=
s
−i
∈
S
L
s
−i
,
−i
(
t
))
(1
−
φ
f
)
·
−i
(
t
−
1) +
φ
f
·
G
(
s
i
(
t
)
,v
(
t
)
,
s
s
where
φ
f
is a learning parameter and
G
(
s
i
(
t
)
,v
(
t
)
,
s
−i
(
t
)) is a function
which determines the probability that the others probably choose a set of
integers.
Then the expected payoff for an integer
k
at turn
t
is calculated as
E
i
(
t
)=
s
∈
S
s
−i
,
B
i
(
t
)
π
i
(
s
i
,
s
−i
(
t
))
where
π
i
(
s
i
,
s
−i
(
t
)) is player
i
's payoff for choosing integer
k
at turn
t
.
Finally, strategy
j
at turn
t
is selected based on the exponential choice rule,
E
i
(
t
))
exp(
λ
f
·
p
i
(
t
)=
E
k
i
(
t
))
,
where
p
i
(
t
) is the probability that player
i
selects strategy
k
at turn
t
,and
λ
f
is the sensitivity of probabilities to expected payoffs.
M
k
=1
exp(
λ
f
·