Reinforcement Learning - Advanced Artificial Intelligence

Information Technology Reference

In-Depth Information

Repeat (for each step of episode)

Choose a from s using policy derived from Q (e.g., ō -greedy)

Take action a , observer r , s ƪ

(

)

(

)

(

)

(

)

′

Q s a

←

Q s a

max

Q s a

−

Q s a

′

←

Until s is terminal

10.7 Function Approximation

RL is a broad class of optimal control methods based on estimating value

functions from experience, simulation, or search. Most of the theoretical

convergence results for RL algorithms assume a tabular representation of the

value function, in which the value of each state is stored in a separate memory

location. However, most practical applications have continuous state spaces, or

very large discrete state spaces, for which such a representation is not feasible.

Thus generalization is crucial to scaling RL algorithms to real world problems.

The kind of generalization we require is often called function approximation

because it takes examples from a desired function (e.g., a value function) and

attempts to generalize from them to construct an approximation of the entire

function. The mapping relations in RL include S ŗ A 、 S ŗ R 、 S×A ŗ R 、 S×A ŗ S

and so on. The nature of function approximation in RL is to estimate these

mapping relations by parameterized functions.

Assuming the starting value of value function is V0, then the sequence of

value functions during learning are:

(

)

(

)

(

)

(

)

Γ Γ

Γ Γ Γ

where ɚ represent equation (10.8).

Most of the traditional RL algorithms adopt lookup-table to save the value

functions. And function approximation adopts parameterized functions to replace

lookup-table. The model of RL with function approximation is shown in Fig. 10.9.

In the model, value function

V is the objective function, function V ƪ is the

estimated function, and M : V ŗ V ƪ is the estimated operator. Assuming the

starting value of value function is V 0 , then the sequence of value functions during

learning are:

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

(

)

M V

Advanced Artificial Intelligence

Search WWH ::

Custom Search

Home