Biology Reference
In-Depth Information
Fig. 5.3 The reinforcement interpretation of the RD
purportedly described by the RD. Within that class, three kinds of learning can be
distinguished: reinforcement, imitation and belief learning.
In reinforcement learning , a player's received payoffs from past interactions are
her only feedback information. That is, the probability of a strategy to be played in
the future is proportional to the success it gave the player in the past. B¨rgers and
Sarin ( 1997 ) present a well-known model of such learning, which conforms to the
replicator dynamic. In their model, a player at stage n plays a mixed strategy
P ( n )
, S J in the
population. The player i observes the (pure) strategy S k and its payoff u i ( S k , S k ),
normalized to lie between 0 and 1, that is realised when she implements her mixed
strategy against other players playing S k . She then 'learns' by adjusting the weight
P k of S k in her mixed strategy in proportion to the payoff that S k gave her by the
following rule:
¼
( P 1 ( n ),
, P J ( n )) that includes all possible pure strategies S 1 ,
...
...
P k n
ð
þ
1
Þ¼
u i S k ;
ð
S k
Þþ
ð
1
u i S k ;
ð
S k
Þ
Þ
P k ð
n
Þ
(5.2)
for all k 0
P k 0 ð
n
þ
1
Þ¼
ð
1
u i S k ;
ð
S k
Þ
Þ
P k 0 ð
n
Þ
k
For the specific case of only two actions, the expected movement of action
probabilities based on this model equals the RD, rescaled by a constant (B¨rgers
and Sarin 1997 ;B¨rgers et al. 2004 ). More generally, if the decision-maker uses
Cross' learning rule, (and satisfies the model's other requirements), then the
learning dynamics satisfies monotonicity and absolute expediency (B¨rgers et al.
2004 ). Both of these properties are also satisfied by the RD. Thus, there is an
analogy between Cross learning and the RD. B¨rgers et al. ( 2004 , p. 358) conclude
from this that their results 'strengthen the case of the use of RD dynamics in
contexts where learning is important'. They also speculate that it may be possible
to adopt their results 'to an evolutionary setting' (B¨rgers et al. 2004 , p. 400) but
refrain from making any specific claims about this.
The reinforcement interpretation of the RD model can be graphically presented
as shown in Fig. 5.3 .
This interpretation differs in a number of features from BRD. It commences with
agents playing mixed strategies (where all organisms share the same support) rather
than pure strategies. These strategies are not inherited, but adopted and adjusted by
the agents. It does not interpret payoffs as fitness, but as subjectively evaluated
outcomes. It is these subjective evaluations that cause the agent's adjustment of her
own strategies. And it is this adjustment, and not differential reproduction, that
constitutes differential representation in the population.
In imitation learning, players occasionally sample other players in the population
and learn about their strategy and the payoff they realised in the last round. They
Search WWH ::




Custom Search