Information Technology Reference
In-Depth Information
Rawls' measures) in order to test their influence on the simulation results. The number
of time-steps of the simulation might also be changed, allowing us to analyze the influ-
ence of the learning process on the results. Finally, the maximum guilt aversion level
and the discretization step of the guilt aversion may also be altered.
Agents
The model is composed of one unique kind (or species ) of agents named 'people'.
Each agent is characterized by a guilt aversion level ( guiltAversion ), a positive float
number lower or equal to the global parameter ( guiltAversionInitMax ), and an history
of previous interactions. Each agent i 's history is a complex structure (a mapping func-
tion) associating each other agent j already met to a list containing: (1) the number of
interactions between both agents, (2) the number of interactions in which j chose to
cooperate with i , and (3) the overall payoff won by i from such interactions with agent
j . As we will see in the following paragraph, the two first elements of the list are taken
into account in the computation of the expected utility, whereas the last one is only
an indicator of the 'quality' of the interaction between both agents. Derived from the
expected utility obtained from the combination of these two first elements, each agent
will compute a guilt dependent utility matrix (containing a modified utility U value )
from the game utility matrix. It is also important to note that agents are not aware of
the guilt aversion level of their interaction partner. They thus make their decision only
depending on other agents behavior (their moves).
Learning: Fictitious Play
In order to explain Nash equilibrium (and selection among various Nash equilibria),
game theorist have traditionally used different kinds of adjustment models (cf. for ex-
ample [27] or [14]); mainly replicator dynamics ( i.e . the relative prevalence of any
strategy has a growth rate proportional to its payoff relative to the average payoff) and
simple belief learning ( i.e . players adjust their beliefs as they accumulate experience,
and that current beliefs influence the current choice of strategy). As shown empirically
by [10], in both symmetric (single population) and two-type population games, “the
learning model is slightly better at explaining the single population data and much bet-
ter at explaining the two population data.”
Thus, in our simulation, we use a simple belief learning process known as 'fictitious
play' , or as the 'Brown-Robinson learning process'. The algorithm was introduced by
Brown [8] as an algorithm for finding the value of a zero-sum game, and first studied
by Robinson [23]. It assumes that players noiselessly best respond to the belief that
other players' current actions will be equal to the average of their actions in all earlier
periods. Informally, we can describe it as follows. Let us assume two players playing
a finite game repeatedly. After arbitrary initial moves in the first round, where each
player chooses a single pure strategy; then both players construct sequences of strategies
according to the following rule: at each step, a player considers the sequence chosen by
the other player, she supposes that the other player will randomize uniformly over that
sequence, and she chooses a best response to that mixed strategy. That is, in every round
each player plays a myopic pure best response against the empirical strategy distribution
of her opponent (a player's sequence is treated as a multi-set of strategies, one of which
 
Search WWH ::




Custom Search