Information Technology Reference
In-Depth Information
5.3.6 Partially Observed Markov Decision Problems
In actual applications, the state is not completely observed in general. For
instance, an autonomous robot's perception is limited to its environment.
Actually there is a close connection between dynamic programming in discrete
state space and optimal control in continuous state space. Therefore, in a way
that is similar to the reconstruction of an estimation of the state by filtering
in continuous problems, we have to face partially observed Markov decision
problems (POMDP). In those problems, the state is not completely observed,
because it is impossible to measure some state variables and because the
observation is corrupted by measurement noise.
Basic principles of dynamic programming can be applied, but the frame-
work is far more complex because the policies are not defined on the state
space but on the belief state space, which is continuous. Actually a belief
state is a probability distribution over the state space.
From an observation trajectory, the belief state is defined as the conditional
probability on the state space with respect to the observations that were
obtained in the past. That probability is updated by Bayes rule. Therefore, it
is possible to obtain an optimal policy among the functions that are defined
on the belief state space.
Unfortunately, for the belief state to be updated, the model must be
known. Therefore, we cannot exploit the method we have just outlined to
build up learning strategies. For this reason, Partially Observed Markov De-
cision Problems will not be developed in the following sections, which are
dedicated to training. We will just mention empirical approaches, which have
been implemented in practical problems. This question is an open research
topic.
5.4 Reinforcement Learning and Neuro-Dynamic
Programming
5.4.1 Policy Evaluation Using Monte Carlo Method and
Reinforcement Learning
Dynamic programming methods that have just been developed in previous
section face practical implementation di culties in real-world problems. In
particular, the policy cost may be di cult to compute by solving a linear
system:
The goal is to evaluate exactly the expected policy cost. If the cardinality
of state space is very large, the solution of the linear system for each
iteration may have a prohibitive computational cost.
To write down the linear system, one has to know exactly all the transition
probabilities from one state to the other according to the different feasible
Search WWH ::




Custom Search