Robotics Reference
In-Depth Information
Figure 25. Arthur Samuel with IBM computer (Courtesy of IBM Corporate Archives)
The second method of learning devised by Samuel was designed to
enable the evaluation function to improve itself by improving the weight-
ings assigned to each of its features. The method is based on the reali-
sation that the backed-up score for the root position in the game tree
should ideally be the same as the score found when the evaluation func-
tion is applied directly to that same position. During play, Samuel's pro-
gram would keep track of how much each of the features in its evaluation
function had contributed to the overall score for a position, and by how
much the backed-up score differed from the static evaluation of the same
position. These differences were used to correct the weightings for each
of the features in the evaluation function, which tended to make future
differences smaller. This particular approach is today called Temporal
Difference learning 17 and was employed, almost 40 years later, by an-
other IBM researcher, Gerald Tesauro, in his world class Backgammon
program. 18
17 Technically Samuel's technique was slightly different from Temporal Difference learning, but
very close.
18 See the sections on NeuroGammon and TD-Gammon later in this chapter.
 
Search WWH ::




Custom Search