Information Technology Reference
In-Depth Information
input
:
T
the current tree
output
:
T
expanded with
nbPlayouts
iterations
for
nbPlayouts
do
(
q
,
m
)
ₐ
select
(
T
);
if
m
=
{∅}
then
q
new
ₐ
expand
(
q
,
m
);
res ₐ
playout
(
q
new
);
backpropagate
(
q
new
,
res
);
else
res ₐ
result
(
q
);
backpropagate
(
q
,
res
);
Algorithm 3.
MCTS
applied to stochastic games
The modified main loop is presented in Alg. 3. Even if a descent can lead to
an endgame, stochastic games can lead to different events during the descent,
that should lead to unevaluated parts of
T
.The
selection
process is guided by
nodes scores and by stochastic events. Thus the statistical scoring process can be
applied systematically during
nbPlayouts
iterations, to insert 0 to
nbPlayouts
nodes in
. In such games,
expansion
,
playout
and
backpropagation
are applied
only if the selected move
m
differs from
T
.
The modified
select
function with chance-nodes is presented in Alg. 4. At
each iteration, the state of the node is checked : if it is a chance-node, then
the
dice
function adds an external event to
q
. If no move is available from
q
,
then
{∅}
,then
q
is inserted
and the move returned to apply to
q
is its first move. Thus each time this node
is selected, it will try to add another move from
q
before looking for the best
children of
q
to descend one more time in
{∅}
is returned as move
m
to apply. If
q
is not in
T
.
The modified
select
function with move-groups is presented in Alg. 5. The
process is divided between 3 cases :
T
-
there is
no move from q
, that is equivalent to no group available. The process
breaks and the tuple (
q
,
) is returned.
-
1
move-group g exists from q
, which means that this group contains at least
1 move. In this case, the process tries to evaluate all possible moves of the
move-group
g
.
-
if 2 or more move-groups exists from
q
, then the process tries to select the
first group without a move. If all groups have 1 or more moves, then the best
group
{∅}
M
best
is selected and 1 move from this group is considered. If this
move is not in
, then it returns the tuple (
q
,
m
). This process implies to
check the intersection between a move-group and
T
.The
one-move
function
defines the policy to generate and add new moves in
T
T
.
The modified
select
function with group-nodes is presented in Alg. 6. Reveal-
ing moves are regrouped by position. Each position to reveal leads to different
Search WWH ::
Custom Search