Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Solving for λ and using j

w j =1and j y nj =1forall N ,weget λ =

n m ( x n )= c , which is the match count after N observations. As a result, w

is after N observations by the principle of maximum likelihood given by

w = c − 1

m ( x n ) y n ,

(5.82)

n =1

Thus, the j th element of w , representing the probability of the classifier having

generated an observation of class j , is the number of matched observations of this

class divided by the total number of observations - a straightforward frequentist

measure.

5.5.3

Incremental Learning for Classification

Let w N be the estimate of w after N observations. Given the new observation

( x N +1 , y N +1 ), the aim of the incremental approach is to find a computatio-

nally ecient approach to update w N to reflect this new knowledge. By (5.82),

c N +1 w N +1 is given by

N +1

c N +1 w N +1 =

m ( x n ) y n

n =1

m ( x n ) y n + m ( x N +1 ) y N +1

n =1

=( c N +1 −

m ( x N +1 )) w N + m ( x N +1 ) y N +1 .

(5.83)

Dividing the above by c N +1 results in the final incremental update

c − N +1 m ( x N +1 )( w N −

w N +1 = w N −

y N +1 ) .

(5.84)

This update tracks (5.82) accurately, is of complexity

( D Y ), and only requires

the parameter vector w and the match count c to be stored. Thus, it is accurate

and ecient.

Example 5.9 (Classifier Model for Classification). Figure 5.3 shows the data of

a classification task with two distinct classes. Observations of classes 1 and 2

are shown by circles and squares, respectively. The larger rectangles indicate the

matched areas of the input space of the three classifiers c 1 , c 2 ,and c 3 . Based on

these data, the number of matched observations of each class as well as w and

τ are shown for each classifier in Table 5.2.

Recall that the elements of w represent the estimated probabilities of having

generated an observation of a specific class. The estimates in Table 5.2 show that

Classifier c 3 is most certain about modelling class 2, while Classifier c 2 is most

uncertain about which class it models. These values are also reflected in τ − 1 ,

which is highest for c 2 and lowest for c 3 .Thus, c 3 is the “best” classifier, while

c 2 is the “worst” - an evaluation that reflects what can be observed in Fig. 5.3.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home