Changing Not Just Analyzing: Control Theory and Reinforcement Learning - Realtime Data Mining

Database Reference

In-Depth Information

The action “no acceleration” generally leads to reduced speed; however, on level

or downhill stretches, it can lead to constant or even increased speed. For instance,

for s 2 , we can specify

p a 0

75, p a 0

2, p a 0

s 2 s 1 ¼ 0

s 2 s 2 ¼ 0

s 2 s 3 ¼ 0

So if we drive at 90 km/h and do not accelerate, the probability that the speed

will reduce to 80 km/h is 75 %, that it will remain at 90 km/h is 20 %, and that it

will increase to 100 km/h is 5 %. Remember that in accordance with ( 3.2 ), the

probabilities must add up to 100 %. Similarly, for the remaining states s 1 and s 3 ,we

can define

p a 0

7, p a 0

s 1 s 1 ¼ 0

s 1 s 2 ¼ 0

p a 0

9, p a 0

s 3 s 2 ¼ 0

s 3 s 3 ¼ 0

The action “acceleration” of course has precisely the inverse effect. We start

once again with the specification for s 2 :

p a 1

1, p a 1

2, p a 1

s 2 s 1 ¼ 0

s 2 s 2 ¼ 0

s 2 s 3 ¼ 0

So if we drive at 90 km/h and accelerate, the probability that the speed will

increase to 100 km/h is 70 %, that it will remain at 90 km/h is 20 %, and that it

will decrease to 80 km/h is 10 %. Similarly, for the remaining states s 1 and s 3 ,we

can define

p a 1

3, p a 1

s 1 s 1 ¼ 0

s 1 s 2 ¼ 0

p a 1

1, p a 1

s 3 s 2 ¼ 0

s 3 s 3 ¼ 0

In so doing, we have adequately described our environment.

■

3.5 The Bellman Equation

We first define an MDP as a quadruplet M:¼ ( S, A, P, R ) of the state and action

spaces S and A , the transition probabilities P , and rewards R . Please note that the

Markov property need not be explicitly stipulated to hold, since it implicitly follows

from the given representations of P and R .

Each policy

( s , a ) induces a Markov chain (MC), which is characterized by the

tuple M π : ¼ ( S , P π ), where P π ¼ ( p s , s 0 ) s , s 0 ∈ S denote the transition probabilities

that result from following the policy

( s , a ):

p ss 0 ¼ X

ð p ss 0 :

a ∈ AðÞ π

;

ð 3

3 Þ

Realtime Data Mining

Search WWH ::

Custom Search

Home