Robotics Reference
In-Depth Information
the information about its moves in that particular game, but if it won the
game it would retain the information and use it to reinforce the informa-
tion it had previously acquired. This reinforcement process increased the
weight of the moves played in winning games. As the number of games
it played increased, so would the amount of data it built up for each po-
sition that it encountered in the games that it won, with the best moves
building up the largest weights. Whenever it encountered a position, the
machine would select the move corresponding to the largest weight of
reinforcement data, and gradually the machine would acquire more and
more reinforcement data for those moves that it played in the games that
it won, leading it to play the best moves most often and the worst moves
least often.
De Latil later proposed a modification to his algorithm. Rather than
discard the information pertaining to the games that it lost, the machine
should collect that data and employ it to reduce the likelihood of it re-
peating bad moves. Just as the good moves received reinforcement in
the form of an increased weighting, so the bad moves would have their
weightings reduced.
Donald Michie employed de Latil's idea in the construction of MEN-
ACE, 22 a Tic-Tac-Toe playing system constructed as an assemblage of
matchboxes (see Figure 46 2 3 ). MENACE consisted of 288 matchboxes,
each one corresponding to one of the 288 essentially different positions
with which the first player can be confronted in a game of Tic-Tac-Toe.
Each of the boxes functions as a separate learning machine, tasked
only with making a decision when its own unique position arises in a
game. In each box there were a number of coloured beads, the various
colours representing codes for the different locations on the board where
MENACE could make a move. When a particular board position was
encountered in a game, MENACE's operator would open the appropri-
ate matchbox and shake it. Each matchbox had a V-shaped cardboard
fence fixed in the front, so that when the box was tilted forward one of
the beads was selected at random by being the first bead to roll into the
apex of the V. The operator would then make the corresponding move
in the game, leaving the box open, with its randomly selected bead vis-
ible, until the end of the game. If MENACE won the game then all of
the boxes corresponding to the moves it had made in that game would
22 MENACE: Matchbox Educable Noughts And Crosses Engine.
23 This figure originally appeared on the page facing page 137 of Machine Intelligence 2, edited by
Ella Dale and Donald Michie, Oliver and Boyd, Edinburgh, 1968.
Search WWH ::




Custom Search