Information Technology Reference
In-Depth Information
input : T the current tree
output : T expanded with nbPlayouts iterations
for nbPlayouts do
( q , m ) select ( T
);
if m = {∅} then
q new expand ( q , m );
res ₐ playout ( q new );
backpropagate ( q new , res );
else
res ₐ result ( q );
backpropagate ( q , res );
Algorithm 3.
MCTS
applied to stochastic games
The modified main loop is presented in Alg. 3. Even if a descent can lead to
an endgame, stochastic games can lead to different events during the descent,
that should lead to unevaluated parts of T .The selection process is guided by
nodes scores and by stochastic events. Thus the statistical scoring process can be
applied systematically during nbPlayouts iterations, to insert 0 to nbPlayouts
nodes in
. In such games, expansion , playout and backpropagation are applied
only if the selected move m differs from
T
.
The modified select function with chance-nodes is presented in Alg. 4. At
each iteration, the state of the node is checked : if it is a chance-node, then
the dice function adds an external event to q . If no move is available from q ,
then
{∅}
,then q is inserted
and the move returned to apply to q is its first move. Thus each time this node
is selected, it will try to add another move from q before looking for the best
children of q to descend one more time in
{∅}
is returned as move m to apply. If q is not in
T
.
The modified select function with move-groups is presented in Alg. 5. The
process is divided between 3 cases :
T
- there is no move from q , that is equivalent to no group available. The process
breaks and the tuple ( q ,
) is returned.
- 1 move-group g exists from q , which means that this group contains at least
1 move. In this case, the process tries to evaluate all possible moves of the
move-group g .
- if 2 or more move-groups exists from q , then the process tries to select the
first group without a move. If all groups have 1 or more moves, then the best
group
{∅}
M best is selected and 1 move from this group is considered. If this
move is not in
, then it returns the tuple ( q , m ). This process implies to
check the intersection between a move-group and
T
.The one-move function
defines the policy to generate and add new moves in
T
T
.
The modified select function with group-nodes is presented in Alg. 6. Reveal-
ing moves are regrouped by position. Each position to reveal leads to different
 
Search WWH ::




Custom Search