Multi-Stage Temporal Difference Learning for 2048 - Technologies and Applications of Artificial Intelligence

Information Technology Reference

In-Depth Information

3

Our TD Learnin

ng Method for 2048

This paper designs our TD

our method, we first chang

MS-TD learning in Subsect

in Subsection 3.4 and some

learning method for 2048 based on the method in [17]

ge the n-tuple network in Subsection 3.1, and then prop

tion 3.2. The experiments for MS-TD learning are descri

issues are discussed in Subsection 3.5.

]. In

pose

ibed

3.1

New N-Tuple Netw

work

In this paper, we use the tu

mirrored tuples. In brief, w

shape as shown in Fig. 4 (

tuples. The number of featu

uples shown in Fig. 4 (b) as well as all of their rotated

we change 4-tuples (1x4 lines) to 6-tuples in a green kn

(a). Apparently, the new 6-tuples cover all the origina

ures increases only by a factor of two or so.

and

nife

al 4-

cores in TD learning with different n-tuple networks

Fig. 5. Average s

scores in TD learning with different n-tuple networks

Fig. 6. Maximum

Search WWH ::

Custom Search

Home