Closed-Loop Control Learning - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

5.3.6 Partially Observed Markov Decision Problems

In actual applications, the state is not completely observed in general. For

instance, an autonomous robot's perception is limited to its environment.

Actually there is a close connection between dynamic programming in discrete

state space and optimal control in continuous state space. Therefore, in a way

that is similar to the reconstruction of an estimation of the state by filtering

in continuous problems, we have to face partially observed Markov decision

problems (POMDP). In those problems, the state is not completely observed,

because it is impossible to measure some state variables and because the

observation is corrupted by measurement noise.

Basic principles of dynamic programming can be applied, but the frame-

work is far more complex because the policies are not defined on the state

space but on the belief state space, which is continuous. Actually a belief

state is a probability distribution over the state space.

From an observation trajectory, the belief state is defined as the conditional

probability on the state space with respect to the observations that were

obtained in the past. That probability is updated by Bayes rule. Therefore, it

is possible to obtain an optimal policy among the functions that are defined

on the belief state space.

Unfortunately, for the belief state to be updated, the model must be

known. Therefore, we cannot exploit the method we have just outlined to

build up learning strategies. For this reason, Partially Observed Markov De-

cision Problems will not be developed in the following sections, which are

dedicated to training. We will just mention empirical approaches, which have

been implemented in practical problems. This question is an open research

topic.

5.4 Reinforcement Learning and Neuro-Dynamic

Programming

5.4.1 Policy Evaluation Using Monte Carlo Method and

Reinforcement Learning

Dynamic programming methods that have just been developed in previous

section face practical implementation di culties in real-world problems. In

particular, the policy cost may be di cult to compute by solving a linear

system:

•

The goal is to evaluate exactly the expected policy cost. If the cardinality

of state space is very large, the solution of the linear system for each

iteration may have a prohibitive computational cost.

•

To write down the linear system, one has to know exactly all the transition

probabilities from one state to the other according to the different feasible

Search WWH ::

Custom Search

Home