Information Technology Reference
In-Depth Information
obtained by playing some number of games against the opponent. This
evaluation would then direct which policy or policies were considered next. A
typical evolutionary method would hill-climb in policy space, successively
generating and evaluating policies in an attempt to obtain incremental
improvements. Or, perhaps, a genetic-style algorithm could be used that would
maintain and evaluate a population of policies. Literally hundreds of different
optimization methods could be applied. By directly searching the policy space we
mean that entire policies are proposed and compared on the basis of scalar
evaluations.
our move
...
Opponent's move
our move
Opponent's move
our move
Opponent's move
(WIN)
……
Fig. 10.6. A sequence of tic-tac-toe moves. The solid lines represent the moves taken during a
game; the dashed lines represent moves that we (our RL player) considered but did not make.
Here is how the tic-tac-toe problem would be approached using reinforcement
learning and approximate value functions. First we set up a table of numbers, one
Search WWH ::




Custom Search