Towards Reinforcement Learning with LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Given the linear approximation Π D = X ( X T DX ) − 1 X T D that returns the

estimate

V = Π D V that minimises the sampling-weighted distance

−

D ,

this approximation operator is a non-expansion with respect to

· D :

Lemma 9.3. The linear approximation operator Π D = X ( X T Dx ) − 1 X T D defi-

nes a non-expansion mapping with respect to the weighted norm

· D .

Proof. Note that D = √ D √ D , and thus we have √ DΠ D = Π D √ D ,where

Π D = √ DX ( X T DX ) − 1 X T √ D is also a projection matrix. Therefore, for any

two vectors V , V , using the induced matrix norm

=max

{

:with

Π D ≤

≤

}

, and the property

1 of projection matrices,

√ D Π D ( V

Π D V D =

V )

Π D V

−

Π D √ D ( V

V )

−

√ D ( V

Π D

V )

≤

−

V D ,

≤

−

(9.48)

which shows that Π D is a non-expansion with respect to

· D .

This shows that linear models are compatible with approximate policy iteration

[215]. However, the LCS model discussed here is non-linear due to the indepen-

dent training of the classifiers. Also, these classifiers are not trained according

to the sampling distribution π if they do not match all states. From the point-

of-view of classifier k , the states are sampled according to Tr( D k ) − 1 π k ,where

π k needs to be normalised by Tr( D k ) − 1 as x π k ( x )

1 and therefore π k is not

guaranteed to be a proper distribution. This implies, that the approximation

operator Π k is a non-expansion mapping with respect to

≤

· D k rather than

D k for any vector z . However, as √ D k = √ M k √ D ,

· D ,and

Π k z

D k

≤

we have

D k z

M k √ Dz

M k

√ Dz

D k =

≤

D .

(9.49)

The second inequality is based on the matrix norm of a diagonal matrix being

given by its largest diagonal element, and thus √ M k =max x m k ( x ) ≤ 1.

This implies that, for any two V , V ,

Π k V D ≥

V D k ≤

V D .

Π k V

−

Π k V

−

Π k V

D k ≤

−

(9.50)

Due to the first inequality having the wrong direction, we cannot state that Π k

is a non-expansion with respect to

· D . In fact, it becomes rather unlikely 3 .

Nonetheless, to be sure about either outcome, further investigation is required.

Not having a clear result for single classifiers, expanding the investigation

to sets of classifiers is superfluous. In any case, it is certain that given stable

classifier models, the non-expansion property of a whole set of classifiers is, as

for

· ∞ , determined by the properties of the mixing model.

3 We have previously stated that Π k is a non-expansion with respect to · D [79].

While showing this, however, a flawed matrix equality was used, which invalidates

the result.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home