Database Reference
In-Depth Information
3.9 Remarks on the Model
In what follows, we shall discuss some important details concerning the model that
we have hitherto abstained from addressing for the sake of a concise introduction.
Apart from being necessary for the exactness of mathematical model, however,
they mostly bring about practical consequences as well.
3.9.1
Infinite-Horizon Problems
As rendered in Sect. 3.1 , we distinguish between episodic task, that is, those that
terminate, and continuing tasks, that is, those that do not terminate. To facilitate
their treatment in form and content, they will be considered within a unified
framework.
As regards continuing tasks, we have no further remarks apart from the require-
ment that
γ <
1. As has already been established in Sect. 3.2 , this requirement is
necessary to ensure existence and uniqueness of the expected results.
With regard to episodic tasks, like those that primarily concern our recommen-
dation engines, the following question arises: how, after all, do we describe the
end of an episode? Since, according to ( 3.2 ), the transition probabilities for
each state-action pair sum up to one, an episode actually never terminates. To
circumvent this, we introduce a so-called terminal (or, absorbing ) state, which
allows for transitions to no state other than itself, and the corresponding reward is
set to zero (Fig. 3.8 ).
Hence, after a certain time step at which the terminal state has been reached, all
further rewards are (in the example depicted by Fig. 3.8 , this time step is 3)
r t ¼ 0,
t
>
t a :
Thus, the sum ( 3.1 ) is also well defined for episodic tasks, and we may therefore
consider both continuous and episodic tasks as infinite sums. Both types of tasks
are said to be infinite-horizon problems [BT96]. Eventually, it should be mentioned
that episodic tasks with
γ ¼ 1 are referred to as stochastic shortest path problems ,
the special properties of which have been studied comprehensively in control
theory.
r 1 = +1
r 2 = +1
r 3 = +1
r 4 = 0
r 5 = 0
s 0
s 1
s 2
s 3
Fig. 3.8 Example of an episode with terminal state ( gray box )
Search WWH ::




Custom Search