Information Technology Reference
In-Depth Information
Repeat (for each step of episode)
Choose a from s using policy derived from Q (e.g., ō -greedy)
Take action a , observer r , s ƪ
(
)
(
)
(
)
(
)
Q s a
,
Q s a
,
+
α
Ç
r
+
γ
max
Q s a
,
Q s a
,
×
É
Ù
a
s
s
Until s is terminal
10.7 Function Approximation
RL is a broad class of optimal control methods based on estimating value
functions from experience, simulation, or search. Most of the theoretical
convergence results for RL algorithms assume a tabular representation of the
value function, in which the value of each state is stored in a separate memory
location. However, most practical applications have continuous state spaces, or
very large discrete state spaces, for which such a representation is not feasible.
Thus generalization is crucial to scaling RL algorithms to real world problems.
The kind of generalization we require is often called function approximation
because it takes examples from a desired function (e.g., a value function) and
attempts to generalize from them to construct an approximation of the entire
function. The mapping relations in RL include S ŗ A S ŗ R S×A ŗ R S×A ŗ S
and so on. The nature of function approximation in RL is to estimate these
mapping relations by parameterized functions.
Assuming the starting value of value function is V0, then the sequence of
value functions during learning are:
(
(
)
(
(
)
)
)
(
)
(
)
V
,
Γ
V
,
Γ Γ
V
,
Γ Γ Γ
V
,
>>
0
0
0
0
where ɚ represent equation (10.8).
Most of the traditional RL algorithms adopt lookup-table to save the value
functions. And function approximation adopts parameterized functions to replace
lookup-table. The model of RL with function approximation is shown in Fig. 10.9.
In the model, value function
V is the objective function, function V ƪ is the
estimated function, and M : V ŗ V ƪ is the estimated operator. Assuming the
starting value of value function is V 0 , then the sequence of value functions during
learning are:
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
V
,
M V
,
Γ
M V
,
M
Γ
M V
,
Γ
M
Γ
M V
,
>
0
0
0
0
0
Search WWH ::




Custom Search