Towards Reinforcement Learning with LCS - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

This shows that it is save to use LCS with independently trained averaging

classifiers for both value iteration and policy iteration, given that the mixing

weights are fixed. Fixing these weights, however, does not allow them to react to

the quality of a classifier's approximation. As discussed in Chap. 6, it is preferable

to adjust the mixing weights inversely proportional to the classifier's prediction

error.

To show that the mixing weights are relevant when investigating the non-

expansion property of the LCS model, consider the following: given two states

X = { x 1 , x 2 } that are sampled with equal frequency, π ( x 1 )= π ( x 2 )=1 / 2, and

two classifiers of which both match x 2 , but only the first one matches x 1 ,wehave

m 1 ( x 1 )= m 1 ( x 2 )= m 2 ( x 2 )=1and m 2 ( x 1 ) = 0. Let the two target vectors

be V =(0 , 1) T and V =(2 , 4). As the classifiers are averaging classifiers,

they will give the estimates V 1 =1 / 2, V 2 =1, V 1 =3, V 2 =4.For x 1

the global prediction is given by classifier 1. For x 2 , on the other hand, the

predictions of the classifiers are mixed and thus, the global prediction will be

in the range [1 / 2 , 1] for V ( x 2 ) and within [3 , 4] for V ( x 2 ). Note that

−

V ∞ =

V ( x 2 )

= 3. Choosing arbitrary mixing weights, classifier

2 can be assigned full weights for V ( x 2 ), such that V ( x 2 )=4.Asaresults,

V ( x 2 )

−

3 . 5, depending on how V 1 ( x 2 )and V 2 ( x 2 ) are combined to

V ( x 2 ). Thus, for a particular set of mixing weights that assign non-zero weights

to V 1 ( x 2 ), the non-expansion property is violated, which shows that mixing

weights are relevant when considering this property.

In the above example, the non-expansion property was violated by using diffe-

rent mixing schemes for V and V .Inthecaseof V , the more accurate Classifier

2 has been assigned full weights. For V , on the other hand, some weight was

assigned to less accurate Classifier 1. Assigning full weight to Classifier 2 in both

cases would have preserved the non-expansion property. This puts forward the

question if using a consistent mixing scheme, like mixing by inverse prediction

error, guarantees a non-expansion with respect to the maximum norm and thus

convergence of the algorithm? More generally, which are the required properties

of the mixing scheme such that non-expansion of Π can be guaranteed?

The proof of Lemma 9.2 relies on the linearity of Π , based on the constant

mixing model, such that ΠV

≤

−

∞ ≤

V ). Making the mixing model

depend on the classifier predictions violates this linearity and requires the use

of a different method for the analysis of its properties. Besides some conjectures

[80, 81], the question of which mixing weights guarantee a non-expansion with

respect to

ΠV = Π ( V

−

· ∞ is still open and requires further investigation.

9.4.4

Non-expansion with Respect to

· D

Recall that the diagonal of D is the sampling distribution π over

with respect

to a particular policy μ , and is given by the steady-state probabilities of the

Markov chain P μ . Following this Markov chain by performing actions according

to μ guarantees that the states are sampled according to π . In the following, it

is assumed that this is the case.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home