Information Technology Reference
In-Depth Information
After a turn, she updates her propensities as
w i,k ( t +1)=(1
φ a ) w i,k ( t )+1 {s i ,s i ( t ) } (1
φ a ) R
where R is a digital payoff, φ a is positive constants called learning parameter,
and s i ( t ) is player i 's actually chosen strategy at t .
- QFP agents
Cheung and Friedman propose a generalized belief-based learning model
(usually called weighted fictitious play ) in which players firstly expect what
the others will do based on their own prior beliefs, which are usually ratios
of the number of submitted plays to the whole moves [6].
Let L i ( t ) be the total possible counts of past plays for player i ( i =
1 ,
,N )and B −i ( t ) be her belief about her opponents will submit
s −i =( s k 1 ,
···
,s k i− 1
i− 1 ,s k i +1
,s k N )where
s −i is a vector of the other
player(s)' submission s.t. s −i S = −i { 1 , ··· ,M} with s i being player
i 's k -th strategy and
···
i +1 ,
···
s −i ( t ) is the strategy vector probably chosen at turn t
s.t.
s −i ( t )=( s 1 ( t ) ,
···
,s i− 1 ( t ) ,s i +1 ( t ) ,
···
,s N ( t )). Then we write them
as L i ( t )= s S
L −i ( t )and B −i ( t )= L s −i ( t )
respectively, with L −i ( t )
0
L ( t )
and L ( t )
0. Note that the possible combination of other player(s)' submis-
sion is obtained by s i ( t ) and winning integer v ( t ). For instance, when player
i submits 1 and the winning integer is 0 (no winner) in ( N,M )=(3 ,
·
)DIY-
L, she can imagine that the others also choose 1. Or, when player i submits
1 and the winning integer is 1 (player i wins) in ( N,M )=(3 , 3) DIY-L,
she will expect that the possible submissions are { 2 , 2 } w.p. 1/4, { 2 , 3 } w.p.
1/2, and
w.p. 1/4. Indeed, the numbers of combinations for each set
ups are 6 in ( N,M )=(3 , 3) DIY-L, 10 in ( N,M )=(3 , 4) , (4 , 3), and 20 in
( N,M )=(4 , 4) respectively.
After a turn, their beliefs are updated as
{
3 , 3
}
L −i ( t
(1
φ f )
·
1) + φ f ·
G ( s i ( t ) ,v ( t ) ,
s −i ,
s −i ( t ))
B −i ( t )=
s −i S
L s
−i ,
−i ( t ))
(1
φ f )
·
−i ( t
1) + φ f ·
G ( s i ( t ) ,v ( t ) ,
s
s
where φ f is a learning parameter and G ( s i ( t ) ,v ( t ) ,
s −i ( t )) is a function
which determines the probability that the others probably choose a set of
integers.
Then the expected payoff for an integer k at turn t is calculated as
E i ( t )=
s S
s −i ,
B i ( t ) π i ( s i ,
s −i ( t ))
where π i ( s i ,
s −i ( t )) is player i 's payoff for choosing integer k at turn t .
Finally, strategy j at turn t is selected based on the exponential choice rule,
E i ( t ))
exp( λ f ·
p i ( t )=
E k i ( t )) ,
where p i ( t ) is the probability that player i selects strategy k at turn t ,and
λ f is the sensitivity of probabilities to expected payoffs.
M
k =1 exp( λ f ·
 
Search WWH ::




Custom Search