Monte-Carlo Tree Reductions for Stochastic Games - Technologies and Applications of Artificial Intelligence - page 234

Information Technology Reference

In-Depth Information

move-groups-cycle-R and move-groups-cycle-M are respectively starting by re-

veal moves and by real moves. Thus move-groups-cycle-R is more dependant on

the number of revealing possibilities than move-groups-cycle-M . The reference

player rand-mm simply plays randomly when pieces are unrevealed and other-

wise applies minimax to find the best move. In our experiments, UCT (abrev.

Upper Confidence bounds applied to Trees) values are computed with

( v/ ( v + d )) + K

∗

log ( n ) / ( v + d )

with n simulations, v wins, d losses and K equals to 0 . 3. As capture has been

proven to contribute in better

MCTS

evaluations [1, 9], captures are preferred

to random moves inside playouts.

As playouts can finish with a draw endgame and are evaluated without heuris-

tic function, we extended the draw rule inside playouts to 640 turns to produce

more informed playouts. Results presented in all tables involve 500 games in

which half are achieved with one player as first and half are achieved with the

otherplayerasfirst.Gamesareplayedwith0 . 01[sec] per move and with 1[sec]

per move.

Tabl e 1. Games against random-player and random-minimax player

Policy

Against rand

Against rand-mm

win

lost

draw

win

lost

draw

with 0.01[sec] per move

move-groups-random 194

0

306

90

95

315

move-groups-cycle-R 81

0

419

100

400

0

move-groups-cycle-M 202

0

298

100

150

250

group-nodes

314

0

186

1

238

261

chance-nodes

360

0

140

191

13

296

with 1[sec] per move

move-groups-random 291

0

209

140

42

318

move-groups-cycle-R

64

0

436

0

437

63

move-groups-cycle-M 282

0

218

205

14

281

group-nodes

353

0

147

49

15

436

chance-nodes

393

0

107

249

3

248

Results of table 1 confirm that the move-groups-cycle-R policy of cycling on

moves and starting by reveal moves is not a good policy. As similar results are

obtained with move-groups-random and move-groups-cycle-M , the knowledge

introduced with cycling and starting by known pieces is inecient to do better

than a random selection. Results show that chance-nodes are more effective

than others with simple playouts ( i.e. no heuristic evaluation function inside

playouts).

In the second experiment, we enhanced these 5 policies by using minimax

as the reference player rand-mm do. When all pieces are revealed, enhanced

Next Page

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home