Information Technology Reference
In-Depth Information
Multi-Stage Temporal Difference Learning for 2048
I-Chen Wu, Kun-Hao Yeh, Chao-Chin Liang, Chia-Chuan Chang, Han Chiang
Department of Computer Science, National Chiao Tung University, Hsinchu, Taiwan
{icwu,khyeh,chaochin,jimmy4834,passerby1023}@aigames.nctu.edu.tw
Abstract. Recently, Szubert and Jaskowski successfully used TD learning
together with n-tuple networks for playing the game 2048. In this paper, we first
improve their result by modifying the n-tuple networks. However, we observe a
phenomenon that the programs based on TD learning still hardly reach large
tiles, such as 32768-tiles (the tiles with value 32768). In this paper, we propose
a new learning method, named multi-stage TD learning, to effectively improve
the performance, especially for maximum scores and the reaching ratio of
32768-tiles. After incorporating shallow expectimax search, our 2048 program
can reach 32768-tiles with probability 10.9%, and obtain the maximum score
605752 and the averaged score 328946. To the best of our knowledge, our
program outperforms all the known 2048 programs up to date, except for the
program developed by the programmers, nicknamed nneonneo and xificurk,
which heavily relies on deep search heuristics tuned manually. The program can
reach 32768-tiles with probability 32%, but ours runs about 100 times faster.
Also interestingly, our new learning method can be easily applied to other
2048-like games, such as Threes. Our program for Threes outperforms all the
known Threes programs up to date.
Keywords: Stochastic Puzzle Game, 2048, Threes!, Temporal Difference
Learning, Expectimax.
1
Introduction
The puzzle game 2048 [7], a single-player stochastic game originated from 1024 [5]
and Threes [6], has recently become very popular over the Internet. The author
Gabriele Cirulli [11] claimed his estimation: the aggregated time of playing the game
online by players during the first three weeks after released was over 3000 years. The
game is intriguing and even addictive to human players, because the rule is simple but
hard to win. Note that players win when reaching 2048-tiles, the tiles with the value
2048. For the same reason, the game also attracted many programmers to develop
artificial intelligence (AI) programs to play it. In [17], the authors also thought that the
game is an interesting testbed for studying AI methods.
Many methods was proposed to design AI programs in the past. Most commonly
used methods were alpha-beta search [8][12], a traditional game search method for
two-player games, and expectimax search [1], a common game search method for
single-player stochastic games. Recently, Szubert and Jaskowski [17] proposed
 
Search WWH ::




Custom Search