Reinforcement Learning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

obtained by playing some number of games against the opponent. This

evaluation would then direct which policy or policies were considered next. A

typical evolutionary method would hill-climb in policy space, successively

generating and evaluating policies in an attempt to obtain incremental

improvements. Or, perhaps, a genetic-style algorithm could be used that would

maintain and evaluate a population of policies. Literally hundreds of different

optimization methods could be applied. By directly searching the policy space we

mean that entire policies are proposed and compared on the basis of scalar

evaluations.

our move

...

Opponent's move

our move

Opponent's move

our move

Opponent's move

(WIN)

……

Fig. 10.6. A sequence of tic-tac-toe moves. The solid lines represent the moves taken during a

game; the dashed lines represent moves that we (our RL player) considered but did not make.

Here is how the tic-tac-toe problem would be approached using reinforcement

learning and approximate value functions. First we set up a table of numbers, one

Search WWH ::

Custom Search

Home