Information Technology Reference
In-Depth Information
important directions of RL. At present, the main technical difficulty is: how to
prove and guarantee the convergence of learning algorithm from theoretical
aspects. The development of effective models for complex MDP will also be
important direction in the future.
Exercises
1. Given a brief description for the main branches of reinforcement learning and
its research history.
2. Explain the similarities and differences between reinforcement learning
models and other machine learning methods.
3. Explain the decision process of MDP and its essence.
4. Given the basic ideas of Monte Carlo methods and its applications in
reinforcement learning.
5. Given the basic ideas of Temporal-difference (TD) learning and illustrate its
process considering playing the game of tic-tac-toe.
6. Consider the deterministic grid world shown below with the absorbing
goal-state G. Here the immediate rewards are in the figure for the labeled
transitions and 0 for all unlabeled transitions. Given the
V
*
value for every
Q
(
s
,
a
)
state in this grid world. Given the
value for every transition. Finally,
γ
=
0
show an optimal policy using
.
10
12
14
G
Search WWH ::




Custom Search