Multi-Stage Temporal Difference Learning for 2048 - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

First, the results shows that the performance for MS-TD learning clearly

outperforms that for the original TD learning. Especially, MS-TD learning

significantly improves the maximum scores and the achieved ratios of 32768-tiles

from 0% to 10.9%.

Second, the results also shows that the expectimax search in depth 5 performs the

best, even better than the one with depth 7. The result shows that higher depths larger

than 5 do not help much. Our observation is that the trained feature weights include

noise which makes it less effective for deeper search.

From above, our 2048 program can reach 32768-tiles with probability 10.9%, the

maximum score 605752, and the averaged score 328946. The program outperforms all

the known 2048 programs up to date, except for the program in [10], which heavily

relies on deep search heuristics tuned manually. Their program can reach 32768-tiles

with probability 32%, but ours runs about 100 times faster.

3.5

Discussion

For MS-TD learning, an issue to discuss is what are the optimal number of stages and

the splitting times. We did try several different kinds of stages by adding more splitting

times. For example, add , the first time when a 8192-tile is created, and ,

the first time when all of 16384-tile, 8192-tile and 4096-tile are created. However, our

experiments showed no better performance than the original three stages.

4

Conclusion

This paper proposes a new learning method, MS-TD (multi-stage TD) learning, which

improves the performance effectively, especially for maximum scores and the reaching

ratios of 32768-tiles. In our experiments, our 2048 program can reach 32768-tiles with

probability 10.9%, the maximum score 605752, and the average score 328946. The

program outperforms all the known 2048 programs up to date, except for the one in

[10], which heavily relies on deep search heuristics tuned manually and runs about 100

times slower than ours.

Interestingly, MS-TD learning can be easily applied to other 2048-like games, such

as Threes. Our program for Threes can reach 6144-tiles with probability 10%, which

outperforms all the known Threes programs up to date and also won the champion in

the 2048-bot tournament [18]. Furthermore, it is interesting and still open whether the

method can be applied to other kinds of applications, such as non-deterministic two-

player games, or even the planning applications, which can be separated into several

stages.

Acknowledgement. The authors would like to thank Ministry of Science and

Technology of the Republic of China (Taiwan) for financial support of this research

under the contract numbers NSC 102-2221-E-009-069-MY2 and 102-2221-E-009-

080-MY2, and the National Center for High-performance Computing (NCHC) for

computer time and facilities.

Search WWH ::

Custom Search

Home