Robotics Reference
In-Depth Information
that each of the moves selected by Tesauro was stronger than all the other
legal moves available in the position. The network employed to decide
when to double 44 when to accept a proffered double and when to resign
in the face of a double—was trained on a separate set of about 3,000
positions.
TD-Gammon
In 1989, in London, Neurogammon won the Backgammon gold medal
at the first Computer Olympiad, in competition against three commer-
cially available programs and two non-commercial entries. Tesauro's
next Backgammon program was called TD-Gammon. 45 While Neu-
rogammon had learned from the games of human experts, TD-Gammon
learned by playing against itself, without the aid of any supervision pro-
vided by an intelligent “teacher”. The result was that TD-Gammon
greatly surpassed all previous computer programs in the level of its play,
employing a learning method based on the approach of Richard Sutton,
an extension of Samuel's work on Checkers.
While developing TD-Gammon Tesauro was rather surprised to find
that a substantial amount of learning took place, even though his pro-
gram started with zero knowledge about how to play the game well. Dur-
ing the first few thousand training games, the program's neural networks
learned a number of elementary strategies and tactics. More sophisti-
cated concepts emerged later, after several tens of thousands of training
games. And as the size of the networks and the amount of training experi-
ence increased, substantial improvements in performance were observed.
Without being given the benefit of any outside expertise, an early ver-
sion of TD-Gammon was able to play at approximately the same level
as Neurogammon. Furthermore, it appeared capable of automatically
discovering features that could be employed to enhance the evaluation
function—an achievement in what was, in the early 1990s, still a new
field of research. 46
Once the Temporal Difference neural networks were able to play as
well as Neurogammon, despite having been primed with virtually no
44 One of the most interesting aspects of Backgammon is the doubling cube, whose faces are
numbered 2, 4, 8, 16, 32 and 64. At the start of a game the players are competing for one point in
the score table, but during the game the players may agree to double the stakes ,andredouble...and
so on. Deciding when to offer to double the stakes, and when to accept or reject such an offer, is a
major factor in distinguishing stronger Backgammon players from weaker ones.
45 TD = Temporal Difference.
46 See the section “Playing Metagames” earlier in this chapter.
Search WWH ::




Custom Search