Information Technology Reference
In-Depth Information
In most applications, the evaluation function of states can be viewed as a
function of features, such as the monotonicity, the number of empty tiles, the number
of mergeable tiles [10], mentioned in Subsection 2.2. Although the function was
actually very complicated, it is usually modified into a linear combination of features
[22] for TD learning, that is, · , where denotes a vector of feature
occurrences in , and denotes a vector of feature weights.
In order to correct the value by the difference , we can adjust the
feature weights by a difference based on , which is for linear
TD(0) learning. Thus, the difference is
∆∆
(3)
TD Learning for 2048 In [17], Szubert and Jaskowaski proposed TD learning for 2048.
A transition from turn to 1 is illustrated in Fig. 3 (below). They also
proposed three kinds of methods of evaluating values for training and learning as
follows.
Fig. 3. Transition of board states
1.
Evaluate actions. This method is to evaluate the function , , which stands for
the expected values of taking an action on a state . For 2048, an action is
one of the four directions, up, down, left, and right. This is so-called Q-learning .
In this case, the agent chooses a move with the highest expected score, as the
following formula.
max,
(4)
2.
Evaluate states to play. This method is to evaluate the value function on
state , the player to move. As shown in Fig. 3, this method evaluates and
.The agent chooses a move with the highest expected score on , as the
following formula.
max , ,,
(5)
Search WWH ::




Custom Search