Information Technology Reference
In-Depth Information
2.2.5
The Problems of Early LCS
In most earlier classifier systems 1 each classifier in the population had an as-
sociated scalar strength. This strength was assigned by the credit allocation
subsystem and acted as the fitness and hence quality rating of the classifier.
On receiving external reward, this reward contributed to the strength of all
classifiers that promoted the action leading to that reward. Learning imme-
diate reward alone is not sucient, as sequential decision tasks might require
a sequence of actions before any reward is received. Thus, reward needs to be
propagated back to all classifiers in the action sequence that caused this reward
to be received. The most popular scheme to perform this credit allocation was
the Implicit Bucket Brigade [112, 186, 187].
Even though this schema worked fairly well, performance in more complica-
ted tasks was still not satisfactory. According to Kovacs [133, 132], the main
problem was the use of classifier strength as its reproductive fitness. This causes
only high-reward classifiers to be maintained, and thus the information about
low-rewarding areas of the environment is lost, and with it the knowledge about
if the performed actions are indeed optimal. A related problem is that if the
credit assignment is discounted, that is, if classifiers that are far away from the
rewarding states receive less credit for causing this reward, then such classifiers
have a lower fitness and are more likely to be removed, causing sub-optimal
action selection in areas distant to rewarding states. Most fundamentally, ho-
wever, is the problem that if the classifier strength is not shared between the
classifiers, then environments with layered payoff will lead to the emergence of
classifiers that match a large number of states, despite them not promoting the
best action in all of those states. Examples for such environments are the ones
that describe sequential decision tasks. It needs to be pointed out that Kovacs
does not consider fitness sharing in his investigations, and that according to Bull
andHurst[34]optimalperformancecanbeachievedevenwithstrength-based
fitness as long as fitness sharing is used, but “[...] suitable system parameters
must be identified for a given problem”, and how to do this remains open to
further investigation.
It has also been shown by Forrest and Miller [88] that the stochastic selec-
tion of matching classifiers can lead to instabilities in any LCS that after each
performed action reduces the strength of all classifiers by a life tax and has a
small message list such that not all active classifiers can post their messages at
once. In addition to these problems, Smith [196] investigated the emergence of
parasitic classifiers that do not directly contribute to action selection but gain
from the successful performance of other classifiers in certain LCS types with
internal message lists.
Even though various taxation techniques, fitness sharing [34], and other me-
thods have been developed to overcome the problems of overly general and pa-
rasitic classifiers, LCS still did not feature satisfactory performance in more
complex tasks. A more drastic change was required.
1 See [10, Chap. 2] for a description and discussion of earlier LCS.
 
Search WWH ::




Custom Search