Information Technology Reference
In-Depth Information
This shows that it is save to use LCS with independently trained averaging
classifiers for both value iteration and policy iteration, given that the mixing
weights are fixed. Fixing these weights, however, does not allow them to react to
the quality of a classifier's approximation. As discussed in Chap. 6, it is preferable
to adjust the mixing weights inversely proportional to the classifier's prediction
error.
To show that the mixing weights are relevant when investigating the non-
expansion property of the LCS model, consider the following: given two states
X = { x 1 , x 2 } that are sampled with equal frequency, π ( x 1 )= π ( x 2 )=1 / 2, and
two classifiers of which both match x 2 , but only the first one matches x 1 ,wehave
m 1 ( x 1 )= m 1 ( x 2 )= m 2 ( x 2 )=1and m 2 ( x 1 ) = 0. Let the two target vectors
be V =(0 , 1) T and V =(2 , 4). As the classifiers are averaging classifiers,
they will give the estimates V 1 =1 / 2, V 2 =1, V 1 =3, V 2 =4.For x 1
the global prediction is given by classifier 1. For x 2 , on the other hand, the
predictions of the classifiers are mixed and thus, the global prediction will be
in the range [1 / 2 , 1] for V ( x 2 ) and within [3 , 4] for V ( x 2 ). Note that
V
V =
V ( x 2 )
= 3. Choosing arbitrary mixing weights, classifier
2 can be assigned full weights for V ( x 2 ), such that V ( x 2 )=4.Asaresults,
3
|
V ( x 2 )
|
V
3 . 5, depending on how V 1 ( x 2 )and V 2 ( x 2 ) are combined to
V ( x 2 ). Thus, for a particular set of mixing weights that assign non-zero weights
to V 1 ( x 2 ), the non-expansion property is violated, which shows that mixing
weights are relevant when considering this property.
In the above example, the non-expansion property was violated by using diffe-
rent mixing schemes for V and V .Inthecaseof V , the more accurate Classifier
2 has been assigned full weights. For V , on the other hand, some weight was
assigned to less accurate Classifier 1. Assigning full weight to Classifier 2 in both
cases would have preserved the non-expansion property. This puts forward the
question if using a consistent mixing scheme, like mixing by inverse prediction
error, guarantees a non-expansion with respect to the maximum norm and thus
convergence of the algorithm? More generally, which are the required properties
of the mixing scheme such that non-expansion of Π can be guaranteed?
The proof of Lemma 9.2 relies on the linearity of Π , based on the constant
mixing model, such that ΠV
V
V ). Making the mixing model
depend on the classifier predictions violates this linearity and requires the use
of a different method for the analysis of its properties. Besides some conjectures
[80, 81], the question of which mixing weights guarantee a non-expansion with
respect to
ΠV = Π ( V
· is still open and requires further investigation.
9.4.4
Non-expansion with Respect to
· D
Recall that the diagonal of D is the sampling distribution π over
with respect
to a particular policy μ , and is given by the steady-state probabilities of the
Markov chain P μ . Following this Markov chain by performing actions according
to μ guarantees that the states are sampled according to π . In the following, it
is assumed that this is the case.
X
Search WWH ::




Custom Search