Civil Engineering Reference
In-Depth Information
Chapter 5
The Mountain Car Problem
The mountain car problem (Moore 1990 ) is commonly used as a benchmark rein-
forcement learning problem to evaluate learning algorithms. The problem places a
car in a valley, where the goal is to get the car to drive out of the valley (Fig. 5.1 ).
The car's engine is not powerful enough for it to drive out of the valley, and the car
must instead build up momentum by successively driving up opposing sides of the
valley. The state ( x
=
[ x ,
x ]) of the car is defined by its position x
Ǚ
[
1 . 2, 0 . 5]
Ǚ
=
and its velocity
x
[
1 . 5, 1 . 5], and the goal is located at x
0 . 5. At the beginning
Ǚ
=
of each episode, the x is uniformly randomly sampled from [
1 . 2, 0 . 5] and
x
0.
We define the current position and velocity of the car by x and
x , respectively, and
Ǚ
the position and velocity of the car at the next time step by x
x , respectively.
and
The car's dynamics follow:
x = x + ʔt x
x = x + ʔt
m a μ x
f
9 . 8 m cos (3 x )
+
where ʔt =
0 . 01 is the time step, m =
0 . 02 is the car's mass, f
=
0 . 2 is the engine
force, and μ =
0 . 5 is a friction coefficient. The variable a represents the action taken
by the agent, where a =−
1 for
driving forwards. In other words, at any discrete time step, the driver gets to choose
from these three actions.
1 for driving backwards, a =
0 for neutral, and a =
5.1
Reinforcement Learning Implementation
A three-layered neural network was used to learn the mountain car problem using
the temporal difference algorithm TD( ʻ ) (Sect. 2.2.3). The input to the network was
the state of the car defined by its position and velocity, and the output of the network
attempted to approximate the value function V ( s , a ), or the utility of taking action a
when in state s at time t . Consequently, the network had two input nodes and three
output nodes. The number of hidden nodes was varied during experimentation and
will be discussed later. The hidden layer of the network used a hyperbolic tangent
Search WWH ::




Custom Search