Robotics Reference
In-Depth Information
program was designed to learn a successful strategy for playing a coin-
tossing game that had been suggested by Shannon. In this game one
computer would choose heads or tails and another computer would try
to guess which. The guessing computer attempts to detect a pattern in
the choosing computer's choices, while the choosing computer attempts
to detect a pattern in the guessing computer's guesses. The game is made
more complicated by the fact that each of the computers will attempt to
change its pattern sufficiently often to flummox its opponent.
Kirsch devised a learning mechanism that seemed to him like a model
of the way in which an animal learns. His animal model was then used
to play the coin-matching game against a choosing program. Evidently
a learning mechanism must respond to some sort of stimulus, which,
in the case of the coin tossing game, was defined as follows. At each
matching of the coin two numbers were generated, one corresponding
to the chooser's move and one corresponding to the animal's guess at the
opponent's move. Since there are only two possibilities, heads (H) and
tails (T), binary numbers were used (1s and 0s).
Kirsch's “animal” learning program was designed to react to the past
four pairs of moves. A pair of moves provides four possible combinations
(HH, HT, TT, TH), and four pairs of moves provides 4
×
×
×
4
4
4
=
256
possible combinations (called stimuli).
The learning mechanism consisted of the animal becoming condi-
tioned to a certain move by the choosing opponent after each of the 256
possible stimuli. For each stimulus, the animal noted what move the
opponent made next. Then, the next time that same stimulus occurred,
i.e., the same sequence of four pairs of moves, the animal duplicated
the move of the opponent that followed that same stimulus on the pre-
vious occasion it occurred. The more the choosing opponent repeated
the same move after any given stimulus, the more the animal program
became “conditioned” to that move.
For each of the 256 possible stimuli the animal program stored a
number, called the conditioning number, which varied between
1and
+
1. Whenever the animal scored a success in predicting the opponent's
choice, the number in the animal program's storage location for the ap-
propriate stimulus was increased. 15
Thus, if the opponent always fol-
15 The size of this increase was a fixed fraction of the difference between its previous value and
+
/ 2 and if the value associated with a particular stimulus was -1, then
the next increase to the conditioning number would be 1
1. Thus, if this fraction were 1
/
2
×
2, since the difference between
1
(its previous value) and
+
1 is 2. So the increase would be 1 and the new value would therefore be
Search WWH ::




Custom Search