Information Technology Reference
In-Depth Information
Fig. 10.9. RL model with function approximation
Like Q-learning, the equations of RL with function approximation are as
follows.
(
(
)
)
(
)
(
)
(
)
(
)
Q s a
,
1
α
V
s a
,
+
α
r s a s
,
,
+
max
V
s a
,
10.19
a
(
)
(
)
(
)
V
s a
,
=
M
Q s a
,
10.20
In RL learning with function approximation, two iterative processes work
simultaneously. One is the iterative process of value function ɚ . The other is the
approximation process of value function M. The correctness and convergence of
the approximation process M play the key role in RL. Function approximation is
an instance of supervised learning, the primary topic studied in machine learning,
artificial neural networks, pattern recognition, and statistical curve fitting, such as
state aggregation, function interpolation and artificial neural networks, etc.
Aggregation is an intuitive and applicable technique to solve large scale
problems. In state aggregation, the state space of the Markov chain is partitioned,
and the states belonging to the same partition subset are aggregated into one
meta-state. The Markov chain is said to be lumpable if the transition process
among meta-states is Markovian for every probability distribution of the initial
state of the original Markov chain, and weak lumpable if the transition process
among meta-states is Markovian only for some initial probability distributions. It
is proved that the function approximation with state aggregation is convergent.
However, it is possible that the convergent value is not the optimal value. To
reach the optimal value, the step could be too long. Thus, it also suffers from the
dimension tragedy for large MDP problems.
Function approximation with artificial neural networks has attracted much
research currently. Though these new methods could accelerate the speed largely,
Search WWH ::




Custom Search