Information Technology Reference
In-Depth Information
move-groups-cycle-R and move-groups-cycle-M are respectively starting by re-
veal moves and by real moves. Thus move-groups-cycle-R is more dependant on
the number of revealing possibilities than move-groups-cycle-M . The reference
player rand-mm simply plays randomly when pieces are unrevealed and other-
wise applies minimax to find the best move. In our experiments, UCT (abrev.
Upper Confidence bounds applied to Trees) values are computed with
( v/ ( v + d )) + K
log ( n ) / ( v + d )
with n simulations, v wins, d losses and K equals to 0 . 3. As capture has been
proven to contribute in better
MCTS
evaluations [1, 9], captures are preferred
to random moves inside playouts.
As playouts can finish with a draw endgame and are evaluated without heuris-
tic function, we extended the draw rule inside playouts to 640 turns to produce
more informed playouts. Results presented in all tables involve 500 games in
which half are achieved with one player as first and half are achieved with the
otherplayerasfirst.Gamesareplayedwith0 . 01[sec] per move and with 1[sec]
per move.
Tabl e 1. Games against random-player and random-minimax player
Policy
Against rand
Against rand-mm
win
lost
draw
win
lost
draw
with 0.01[sec] per move
move-groups-random 194
0
306
90
95
315
move-groups-cycle-R 81
0
419
100
400
0
move-groups-cycle-M 202
0
298
100
150
250
group-nodes
314
0
186
1
238
261
chance-nodes
360
0
140
191
13
296
with 1[sec] per move
move-groups-random 291
0
209
140
42
318
move-groups-cycle-R
64
0
436
0
437
63
move-groups-cycle-M 282
0
218
205
14
281
group-nodes
353
0
147
49
15
436
chance-nodes
393
0
107
249
3
248
Results of table 1 confirm that the move-groups-cycle-R policy of cycling on
moves and starting by reveal moves is not a good policy. As similar results are
obtained with move-groups-random and move-groups-cycle-M , the knowledge
introduced with cycling and starting by known pieces is inecient to do better
than a random selection. Results show that chance-nodes are more effective
than others with simple playouts ( i.e. no heuristic evaluation function inside
playouts).
In the second experiment, we enhanced these 5 policies by using minimax
as the reference player rand-mm do. When all pieces are revealed, enhanced
 
Search WWH ::




Custom Search