Information Technology Reference
In-Depth Information
V k ( x ) as the classifier does not aim at modelling the value for this state. The
most reliable model in such a case is in fact given by the global model V ( x ).
Generally, the global model will be used for all updates, regardless of whether
the classifier matches the next state or not. This is justified by the observation
that the global model is on average more accurate that the local models, as was
established in Chap. 6. Based on this principle, Bellman's Equation V =T V
can be reformulated for LCS with independent classifiers to
V k = Π k T V = Π k T
k
G k V k ,
k =1 ,...,K,
(9.22)
where Π k expresses the approximation operator for classifier k ,thatdoesnot
necessarily need to describe a linear approximation. By adding k G k to both
sides of the first equality of (9.22) and using (9.21), we get the alternative ex-
pression V =ΠT V , which shows that (9.22) is in fact Bellman's Equation
with LCS approximation. Nonetheless, the relation is preferably expressed by
(9.22), as it shows what the classifiers model rather than what the global model
models. For a fixed model structure
M
, any method that performs DP or RL
with the here described LCS model type should aim at finding the solution to
(9.22).
9.3.3
Asynchronous Value Iteration with LCS
Let us consider approximate value iteration before its asynchronous variant is
derived: as given in Sect. 9.2.3, approximate value iteration is performed by the
iteration V t +1 =ΠT V t . Therefore, using (9.21), value iteration with LCS is
given by the iteration
with V t +1 =T
k
V k,t +1 = Π k V t +1 ,
G k,t V k,t ,
(9.23)
which has to be performed by each classifier separately. The iteration was split
into two components to show that firstly one finds the updated value vector V t +1
by applying the T operator to the global model, which is then approximated by
each classifier separately. The subscript
· t is added to the mixing model to express
that it might depend on the current approximation and might therefore change
with each iteration. Note that the fixed point of (9.23) is the desired Bellman
Equation in the LCS context (9.22).
The elements of the updated value vector V t +1 are based on (9.23) and (9.9),
which results in
x i ,a ) r x i x j ( a )+ γ
k
g k,t ( x j ) V k,t ( x j ) ,
V t +1 ( x i )=max
a∈A
p ( x j |
(9.24)
x j ∈X
where V t +1 ( x i ) denotes the i th element of V t +1 ,and V k,t ( x j ) denotes the j th
element of
V k,t . Subsequently, each classifier is trained by batch learning, based
Search WWH ::




Custom Search