How Computers Think - Robots Unlimited: Life in a Virtual Age

Robotics Reference

In-Depth Information

the information about its moves in that particular game, but if it won the

game it would retain the information and use it to reinforce the informa-

tion it had previously acquired. This reinforcement process increased the

weight of the moves played in winning games. As the number of games

it played increased, so would the amount of data it built up for each po-

sition that it encountered in the games that it won, with the best moves

building up the largest weights. Whenever it encountered a position, the

machine would select the move corresponding to the largest weight of

reinforcement data, and gradually the machine would acquire more and

more reinforcement data for those moves that it played in the games that

it won, leading it to play the best moves most often and the worst moves

least often.

De Latil later proposed a modification to his algorithm. Rather than

discard the information pertaining to the games that it lost, the machine

should collect that data and employ it to reduce the likelihood of it re-

peating bad moves. Just as the good moves received reinforcement in

the form of an increased weighting, so the bad moves would have their

weightings reduced.

Donald Michie employed de Latil's idea in the construction of MEN-

ACE, 22 a Tic-Tac-Toe playing system constructed as an assemblage of

matchboxes (see Figure 46 2 3 ). MENACE consisted of 288 matchboxes,

each one corresponding to one of the 288 essentially different positions

with which the first player can be confronted in a game of Tic-Tac-Toe.

Each of the boxes functions as a separate learning machine, tasked

only with making a decision when its own unique position arises in a

game. In each box there were a number of coloured beads, the various

colours representing codes for the different locations on the board where

MENACE could make a move. When a particular board position was

encountered in a game, MENACE's operator would open the appropri-

ate matchbox and shake it. Each matchbox had a V-shaped cardboard

fence fixed in the front, so that when the box was tilted forward one of

the beads was selected at random by being the first bead to roll into the

apex of the V. The operator would then make the corresponding move

in the game, leaving the box open, with its randomly selected bead vis-

ible, until the end of the game. If MENACE won the game then all of

the boxes corresponding to the moves it had made in that game would

22 MENACE: Matchbox Educable Noughts And Crosses Engine.

23 This figure originally appeared on the page facing page 137 of Machine Intelligence 2, edited by

Ella Dale and Donald Michie, Oliver and Boyd, Edinburgh, 1968.

Search WWH ::

Custom Search

Home