How Computers Play Games - Robots Unlimited: Life in a Virtual Age

Robotics Reference

In-Depth Information

Figure 25. Arthur Samuel with IBM computer (Courtesy of IBM Corporate Archives)

The second method of learning devised by Samuel was designed to

enable the evaluation function to improve itself by improving the weight-

ings assigned to each of its features. The method is based on the reali-

sation that the backed-up score for the root position in the game tree

should ideally be the same as the score found when the evaluation func-

tion is applied directly to that same position. During play, Samuel's pro-

gram would keep track of how much each of the features in its evaluation

function had contributed to the overall score for a position, and by how

much the backed-up score differed from the static evaluation of the same

position. These differences were used to correct the weightings for each

of the features in the evaluation function, which tended to make future

differences smaller. This particular approach is today called Temporal

Difference learning 17 and was employed, almost 40 years later, by an-

other IBM researcher, Gerald Tesauro, in his world class Backgammon

program. 18

17 Technically Samuel's technique was slightly different from Temporal Difference learning, but

very close.

18 See the sections on NeuroGammon and TD-Gammon later in this chapter.

Search WWH ::

Custom Search

Home