Multi-Stage Temporal Difference Learning for 2048 - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

to perform well in the first stage, but may not in the rest of stages for the following

reason. More large tiles in these stages increase the difficulty of playing the game, and

therefore the feature weights cannot accurately reflect the expected scores with the

difficulty. Thus, using another set of feature weights in the next stage makes it more

likely for the feature weights to reflect the expected scores. In next subsection, the

observation is justified in the experiments with significant improvements for 2048.

3.3

Experiments for MS-TD Learning

In the experiment for MS-TD learning, 5 million training games was run in each stage,

and average and maximum scores are sampled every 1000 games.

Fig. 7. Average scores in MS-TD learning

Fig. 8. Maximum scores in MS-TD learning

Search WWH ::

Custom Search

Home