Information Technology Reference
In-Depth Information
Temporal Difference ( TD ) learning together with n-tuple networks for the game. They
successfully used it to reach a win rate (the ratio of reaching 2048-tiles) 97%, and
obtain the averaged score 100,178 with maximum score 261,526. However, we
observe a phenomenon: the TD learning method tends to maximize the average scores,
but becomes less motivated to reach large tiles, such as 16384 or 32768.
In this paper, we first improve their result by modifying the n-tuple networks.
However, we observe a phenomenon that the programs based on TD learning (together
with n-tuple network still hardly reach 32768-tiles (the tiles with value 32768), even
with the help of expectimax search.
Furthermore, we propose a new technique, named multi-stage TD ( MS-TD )
learning , to help reach large tiles more easily. In this learning method, we separate the
training into multiple stages similar to those used in [4]. After incorporating shallow
expectimax search, our 2048 program can reach 32768-tiles with probability 10.9%,
and obtain the maximum score 605752, and the averaged score 328946.
To the best of our knowledge, our program outperforms all the known 2048
programs up to date, except for the one in [10], which heavily relies on deep search
heuristics tuned manually. The program can reach 32768-tiles with probability 32%,
but ours runs 300-1000 moves/second, about 100 times faster than theirs, 3-4
moves/second.
More interesting, our new learning method can be easily applied to other 2048-like
games, such as Threes. Our program for Threes can reach 6144-tiles with probability
10%, which outperforms all the known Threes programs up to date and also won the
champion in the 2048-bot tournament [18]. However, due to the limit of paper size, the
research on Threes is omitted in this paper.
2
Background
This section reviews some backgrounds for this paper. First, introduce the rules of
2048. Second, we briefly review game tree search. Third, review TD learning, and
discuss three different evaluation methods and n-tuple networks for 2048 proposed
in [17].
2.1
Rules of 2048
The game 2048 is a single-player game that can be played on web pages and mobile
devices with a 4x4 board, where each cell is either empty or placed with a tile labeled
with a value which is a power of two. Let -tile denote the tile with a value .
Initially, two tiles, 2- or 4-tiles, are placed on the board at random. In each turn, the
player makes a move and then the game generates a new 2-tile with a probability of
9/10 or 4-tile with a probability of 1/10 on an empty cell chosen at random.
To make a move, the player chooses one of the four directions, up, down, left and
right. Upon choosing a direction, all the tiles move in that direction as far as they can
until they meet the border or there is already a different tile next to it. When sliding a
tile, say -tile, if the tile next to it is also a -tile, then the two tiles will be merged
into a larger tile, 2 -tile. At the same time, the player gains 2 more points in the
score. A move is legal if at least one tile can be moved.
Search WWH ::




Custom Search