Information Technology Reference
In-Depth Information
V t (s) V*(s) When t →∞ . The award evaluation will be acquired until | Ș V| less
than a small positive number if repeat and make iteration to each state.
Fig. 10.3. Dynamic programming method
The typical model of dynamic programming model has limited usage, as
many problems are difficult to give the integral model of the environment. For
example, simulation robot soccer is such a problem, which can be solved by
real-time dynamic programming methods. In real-time dynamic programming,
the environment model is not required to give first, but get the environment
model by testing in a real environment. The use of anti-nerve-state network can
be used for generalization of states. The input unit of the network is the state of
the environment s. The output of the network is the evaluation of the state V(s).
DP methods update estimates of the values of states based on estimates of the
values of successor states. That is, they update estimates on the basis of other
estimates. We call this general idea bootstrapping. Many reinforcement learning
methods perform bootstrapping, even those that do not require, as DP requires, a
complete and accurate model of the environment. In the next chapter we explore
reinforcement learning methods that do not require a model and do not bootstrap.
In the chapter after that we explore methods that do not require a model but do
bootstrap. These key features and properties are separable, yet can be mixed in
interesting combinations.
10.4 Monte Carlo Methods
Monte Carlo methods are a class of computational algorithms that rely on
repeated random sampling to compute their results. Monte Carlo methods are
Search WWH ::




Custom Search