Multi-Stage Temporal Difference Learning for 2048 - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

Temporal Difference ( TD ) learning together with n-tuple networks for the game. They

successfully used it to reach a win rate (the ratio of reaching 2048-tiles) 97%, and

obtain the averaged score 100,178 with maximum score 261,526. However, we

observe a phenomenon: the TD learning method tends to maximize the average scores,

but becomes less motivated to reach large tiles, such as 16384 or 32768.

In this paper, we first improve their result by modifying the n-tuple networks.

However, we observe a phenomenon that the programs based on TD learning (together

with n-tuple network still hardly reach 32768-tiles (the tiles with value 32768), even

with the help of expectimax search.

Furthermore, we propose a new technique, named multi-stage TD ( MS-TD )

learning , to help reach large tiles more easily. In this learning method, we separate the

training into multiple stages similar to those used in [4]. After incorporating shallow

expectimax search, our 2048 program can reach 32768-tiles with probability 10.9%,

and obtain the maximum score 605752, and the averaged score 328946.

To the best of our knowledge, our program outperforms all the known 2048

programs up to date, except for the one in [10], which heavily relies on deep search

heuristics tuned manually. The program can reach 32768-tiles with probability 32%,

but ours runs 300-1000 moves/second, about 100 times faster than theirs, 3-4

moves/second.

More interesting, our new learning method can be easily applied to other 2048-like

games, such as Threes. Our program for Threes can reach 6144-tiles with probability

10%, which outperforms all the known Threes programs up to date and also won the

champion in the 2048-bot tournament [18]. However, due to the limit of paper size, the

research on Threes is omitted in this paper.

2

Background

This section reviews some backgrounds for this paper. First, introduce the rules of

2048. Second, we briefly review game tree search. Third, review TD learning, and

discuss three different evaluation methods and n-tuple networks for 2048 proposed

in [17].

2.1

Rules of 2048

The game 2048 is a single-player game that can be played on web pages and mobile

devices with a 4x4 board, where each cell is either empty or placed with a tile labeled

with a value which is a power of two. Let -tile denote the tile with a value .

Initially, two tiles, 2- or 4-tiles, are placed on the board at random. In each turn, the

player makes a move and then the game generates a new 2-tile with a probability of

9/10 or 4-tile with a probability of 1/10 on an empty cell chosen at random.

To make a move, the player chooses one of the four directions, up, down, left and

right. Upon choosing a direction, all the tiles move in that direction as far as they can

until they meet the border or there is already a different tile next to it. When sliding a

tile, say -tile, if the tile next to it is also a -tile, then the two tiles will be merged

into a larger tile, 2 -tile. At the same time, the player gains 2 more points in the

score. A move is legal if at least one tile can be moved.

Technologies and Applications of Artificial Intelligence

Search WWH ::

Custom Search

Home