Information Technology Reference
In-Depth Information
Internal Messages
Message List
[condition] → [action] [credit]
[11#....10] → [01..10] [10.5]
[000....01] → [00..00] [12.7]
[1#0....11] → [01..00] [ 6.1]
.
[0#1....00] → [00..10] [ 1.2]
State
[100....11]
Action
[00..10]
Reward
[5.2]
Environment
Fig. 2.3. Schematic illustration of an LCS with a single message list. Its operation is
described in the main text.
All of the messages are usually encoded using binary strings. Hence, to allow
matching of messages by classifier conditions, we are required to encode conditi-
ons and actions of classifiers as binary strings as well. A classifier can generalise
over several different input messages by introducing don't care symbols “#” into
its condition that match both both 1's and 0's in the corresponding position of
the input message. The condition “0#1”, for example, matches inputs “001” and
“011” equally. Similarly, actions of the same length as classifier conditions can
also contain the “#” symbol (in this case called pass-through ), which implies
that specific bits of the matching message are passed though to the actions,
allowing a single classifier to perform different actions depending on the input
message. The latter feature of generalisation in the classifier actions is much less
frequently used than generalisation in the classifier condition.
The description above covers how the agent decides which actions to perform
(called the performance subsystem ) but does not explain how such an agent can
react to external reward to optimise its behaviour in a given environment. Gene-
rally, the behaviour is determined by the population of classifiers and the conflict
resolution subsystem. Hence, considering that the functionality of the conflict re-
solution subsystem is determined by properties of the classifiers, learning can be
achieved by evaluating the quality of each classifier and aiming at a population
that only contains classifiers of high quality. This is achieved by a combination
of the credit allocation subsystem and the rule induction subsystem .Theroleof
the former is to distribute externally received reward to classifiers that promoted
the actions responsible for receiving this reward. The latter system creates new
rules based on classifiers with high credit to promote the ones that are assumed
to be of good quality.
Search WWH ::




Custom Search