Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Therefore, the probability of the classifier having generated class j is given by

w j ,whichisthe j th element of its parameter vector w

D Y .

∈ R

5.5.1 A Quality Measure for Classification

Good classifiers are certain about which classes they are associated with. This

implies that one aims at finding classifiers that have a high probability associated

with a single class, and low probability for all other classes.

For a two-class problem, the relation w 2 =1

−

w 1 is required to hold to satisfy

j w j = 1. In such a case, the model's variance var( y

w )= w 1 (1

−

w 1 ) is a good

measure of the model's quality as it is var( y

w )=0for w 1 =0or w 2 =0,and

has its maximum var( y

w )=0 . 25 at w 1 =0 . 5, which is the point of maximum

uncertainty.

The same principle can be extended to multi-class problems, by taking the

product of the elements of w , denoted τ − 1 ,andgivenby

D Y

τ − 1 =

w j .

(5.78)

i =1

In the three-class case, for example, the worst performance occurs at w 1 = w 2 =

w 3 =1 / 3, at which point τ − 1

is maximised. Note that τ − 1

is, unlike for linear

regression, formally not the precision estimate.

As τ − 1 is easily computed from w , its estimate does not need to be maintained

separately. Thus, the description of batch and incremental learning approaches

deals exclusively with the estimation of w .

5.5.2 Batch Approach for Classification

Recall that the aim of a classifier is to solve (4.24), which, together with (5.77)

results in the constrained optimisation problem

D Y

max

m ( x n )

y nj ln w j ,

(5.79)

n =1

j =1

D Y

subject to

w j =1 .

j =1

− j w j =0,the

Using the Lagrange multiplier λ to express the constraint 1

aim becomes to maximise

⎛

⎞

D Y

⎝ 1

⎠ .

m ( x n )

y nj ln w j + λ

−

w j

(5.80)

n =1

j =1

Differentiating the above with respect to w j for some j , setting it to 0, and

solving for w j results in the estimate

w j = λ − 1

m ( x n ) y nj .

(5.81)

n =1

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home