Background - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Internal Messages

Message List

[condition] → [action] [credit]

[11#....10] → [01..10] [10.5]

[000....01] → [00..00] [12.7]

[1#0....11] → [01..00] [ 6.1]

.

[0#1....00] → [00..10] [ 1.2]

State

[100....11]

Action

[00..10]

Reward

[5.2]

Environment

Fig. 2.3. Schematic illustration of an LCS with a single message list. Its operation is

described in the main text.

All of the messages are usually encoded using binary strings. Hence, to allow

matching of messages by classifier conditions, we are required to encode conditi-

ons and actions of classifiers as binary strings as well. A classifier can generalise

over several different input messages by introducing don't care symbols “#” into

its condition that match both both 1's and 0's in the corresponding position of

the input message. The condition “0#1”, for example, matches inputs “001” and

“011” equally. Similarly, actions of the same length as classifier conditions can

also contain the “#” symbol (in this case called pass-through ), which implies

that specific bits of the matching message are passed though to the actions,

allowing a single classifier to perform different actions depending on the input

message. The latter feature of generalisation in the classifier actions is much less

frequently used than generalisation in the classifier condition.

The description above covers how the agent decides which actions to perform

(called the performance subsystem ) but does not explain how such an agent can

react to external reward to optimise its behaviour in a given environment. Gene-

rally, the behaviour is determined by the population of classifiers and the conflict

resolution subsystem. Hence, considering that the functionality of the conflict re-

solution subsystem is determined by properties of the classifiers, learning can be

achieved by evaluating the quality of each classifier and aiming at a population

that only contains classifiers of high quality. This is achieved by a combination

of the credit allocation subsystem and the rule induction subsystem .Theroleof

the former is to distribute externally received reward to classifiers that promoted

the actions responsible for receiving this reward. The latter system creates new

rules based on classifiers with high credit to promote the ones that are assumed

to be of good quality.

Search WWH ::

Custom Search

Home