Information Technology Reference
In-Depth Information
9.4.3
·
In the following, π ( x ) denotes the sampling probability for state x according to
the steady state distribution of the Markov chain P μ ,and π k ( x )= m k ( x ) π ( x )
denotes this distribution augmented by the matching of classifier k .Also, M k =
diag( m k ( x 1 ) ,...,m k ( x N )) is the matching matrix, as in Chap. 5, D =
diag( π ( x 1 ) ,...,π ( x N )) is the sampling distribution matrix, and D k = M k D
is the sampling distribution matrix with respect to classifier k . Each classifier k
aims at finding the weight vector w k that minimises
Non-expansion with Respect to
D k ,whichisgi-
ven by w k =( X T D k X ) 1 X T D k V . Thus, a classifier's approximation operator is
the projection matrix Π k = X ( X T D k X ) 1 X T D k . such that V = X w = Π k V .
It cannot be guaranteed that general linear models form a non-expansion with
respect to
Xw k
V
· . Gordon, for example, has shown that this is not the case for
straight line models [96]. Averagers, on the other hand, are a form of linear
model, but provide a non-expansion with respect to
· and thus can be used
for both value iteration and policy iteration.
With respect to LCS, each single classifier, as well as the whole set of classifiers
need to conform to the non-expansion property. This rules out the general use
of linear model classifiers. Instead, only averaging classifiers (see Ex. 5.2) will be
discussed, as their model provides a non-expansion with respect to
·
:
Lemma 9.1. The model of averaging classifiers forms a non-expansion with re-
spect to the maximum norm.
Proof. As for averaging classifiers X =(1 ,..., 1) T , their approximation operator
is the same for all states and is given by
( Π k V ) j =Tr( D k ) 1
x∈X
π k ( x ) V ( x ) .
(9.35)
Let V , V be two vectors such that V
V , which implies that the vector
a = V
V is non-negative in all its elements. Thus, we have for any i ,
( Π k V ) i =Tr( D k ) 1
x∈X
Tr( D k ) 1
x∈X
π k ( x ) V ( x )
π k ( x ) a
( Π k V ) i , (9.36)
due to the non-negativity of the elements of D k a . Also, for any scalar b and
vector e =(1 ,..., 1) T ,
( Π k ( V + b e )) i =( Π k V ) i + b
(9.37)
holds.
Let V , V now be two arbitrary vectors, not bound to V
V ,andlet
V =max i |
V i |
c =
V
V i
be their maximum absolute distance. Thus, for
any i ,
V i
V i
c
V i + c
(9.38)
holds. Applying Π k and using the above properties gives
( Π k V ) i
( Π k V ) i
c
( Π k V ) i + c,
(9.39)
 
Search WWH ::




Custom Search