How Computers Play Games - Robots Unlimited: Life in a Virtual Age

Robotics Reference

In-Depth Information

that each of the moves selected by Tesauro was stronger than all the other

legal moves available in the position. The network employed to decide

when to double 44 when to accept a proffered double and when to resign

in the face of a double—was trained on a separate set of about 3,000

positions.

TD-Gammon

In 1989, in London, Neurogammon won the Backgammon gold medal

at the first Computer Olympiad, in competition against three commer-

cially available programs and two non-commercial entries. Tesauro's

next Backgammon program was called TD-Gammon. 45 While Neu-

rogammon had learned from the games of human experts, TD-Gammon

learned by playing against itself, without the aid of any supervision pro-

vided by an intelligent “teacher”. The result was that TD-Gammon

greatly surpassed all previous computer programs in the level of its play,

employing a learning method based on the approach of Richard Sutton,

an extension of Samuel's work on Checkers.

While developing TD-Gammon Tesauro was rather surprised to find

that a substantial amount of learning took place, even though his pro-

gram started with zero knowledge about how to play the game well. Dur-

ing the first few thousand training games, the program's neural networks

learned a number of elementary strategies and tactics. More sophisti-

cated concepts emerged later, after several tens of thousands of training

games. And as the size of the networks and the amount of training experi-

ence increased, substantial improvements in performance were observed.

Without being given the benefit of any outside expertise, an early ver-

sion of TD-Gammon was able to play at approximately the same level

as Neurogammon. Furthermore, it appeared capable of automatically

discovering features that could be employed to enhance the evaluation

function—an achievement in what was, in the early 1990s, still a new

field of research. 46

Once the Temporal Difference neural networks were able to play as

well as Neurogammon, despite having been primed with virtually no

44 One of the most interesting aspects of Backgammon is the doubling cube, whose faces are

numbered 2, 4, 8, 16, 32 and 64. At the start of a game the players are competing for one point in

the score table, but during the game the players may agree to double the stakes ,andredouble...and

so on. Deciding when to offer to double the stakes, and when to accept or reject such an offer, is a

major factor in distinguishing stronger Backgammon players from weaker ones.

45 TD = Temporal Difference.

46 See the section “Playing Metagames” earlier in this chapter.

Search WWH ::

Custom Search

Home