Information Technology Reference
In-Depth Information
First, the results shows that the performance for MS-TD learning clearly
outperforms that for the original TD learning. Especially, MS-TD learning
significantly improves the maximum scores and the achieved ratios of 32768-tiles
from 0% to 10.9%.
Second, the results also shows that the expectimax search in depth 5 performs the
best, even better than the one with depth 7. The result shows that higher depths larger
than 5 do not help much. Our observation is that the trained feature weights include
noise which makes it less effective for deeper search.
From above, our 2048 program can reach 32768-tiles with probability 10.9%, the
maximum score 605752, and the averaged score 328946. The program outperforms all
the known 2048 programs up to date, except for the program in [10], which heavily
relies on deep search heuristics tuned manually. Their program can reach 32768-tiles
with probability 32%, but ours runs about 100 times faster.
3.5
Discussion
For MS-TD learning, an issue to discuss is what are the optimal number of stages and
the splitting times. We did try several different kinds of stages by adding more splitting
times. For example, add , the first time when a 8192-tile is created, and ,
the first time when all of 16384-tile, 8192-tile and 4096-tile are created. However, our
experiments showed no better performance than the original three stages.
4
Conclusion
This paper proposes a new learning method, MS-TD (multi-stage TD) learning, which
improves the performance effectively, especially for maximum scores and the reaching
ratios of 32768-tiles. In our experiments, our 2048 program can reach 32768-tiles with
probability 10.9%, the maximum score 605752, and the average score 328946. The
program outperforms all the known 2048 programs up to date, except for the one in
[10], which heavily relies on deep search heuristics tuned manually and runs about 100
times slower than ours.
Interestingly, MS-TD learning can be easily applied to other 2048-like games, such
as Threes. Our program for Threes can reach 6144-tiles with probability 10%, which
outperforms all the known Threes programs up to date and also won the champion in
the 2048-bot tournament [18]. Furthermore, it is interesting and still open whether the
method can be applied to other kinds of applications, such as non-deterministic two-
player games, or even the planning applications, which can be separated into several
stages.
Acknowledgement. The authors would like to thank Ministry of Science and
Technology of the Republic of China (Taiwan) for financial support of this research
under the contract numbers NSC 102-2221-E-009-069-MY2 and 102-2221-E-009-
080-MY2, and the National Center for High-performance Computing (NCHC) for
computer time and facilities.
Search WWH ::




Custom Search