Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

x T Ay be the inner product weighted by

product of x and y ,atlet

x , y

≡

A , forming the inner product space

· A . Then,

A ≡

x , x

A is the norm

associated with the inner produce space

· A . Any two vectors x , x are said to

be A -orthogonal, if

x , x

A =0.Notethat

≡

I is the Euclidean norm,

where I is the identity matrix.

5.2.1 The Weight Vector

Using the matrix notation introduced in (3.4), and defining the diagonal N × N

matching matrix M k of classifier k by M k =diag( m ( x 1 ) ,...,m ( x N )), in this

chapter simply denoted M , (5.5) can be rewritten to

( Xw

y ) =mi w

y ) T M ( Xw

2 M .

mi w

−

(5.7)

Thus, the aim is to find the w that minimises the weighted distance between the

estimated outputs Xw and the observed outputs y in the inner product space

· M . This distance is convex with respect to w and therefore has a unique

minimum [26]. Note that as the output space is single-dimensional, the set of

observed outputs is given by the vector y rather than the matrix Y .

The solution to (5.7) is found by setting its first derivative to zero, resulting

w = X T MX − 1 X T My . (5.8)

Alternatively, a numerically more stable solution that can also be computed if

X T MX is singular and therefore cannot be inverted, is

w = √ MX +

√ My ,

(5.9)

where X +

( X T X ) − 1 X T denotes the pseudo-inverse of matrix X [19].

Using the weight vector according to (5.8), the matching-weighted vector of

estimated outputs X w evaluates to

X w = X X T MX − 1 X T My .

≡

(5.10)

Observe that X ( X T MX ) − 1 X T M is a projection matrix that projects the vec-

tor of observed outputs y onto the hyperplane

{

∈ R

D X

}

with respect to

· M . This result is intuitively plausible, as the w that minimises the weigh-

ted distance

M between the observed and the estimated outputs is

the closest point on this hyperplane to y with respect to ·, · M ,whichisthe

orthogonal projection of y in ·, · M onto this plane. This concept will be used

extensively in Chap. 9.

−

5.2.2 The Noise Precision

Equation (5.6) needs to be solved in order to get the maximum likelihood noise

precision. As before, we evaluate the maximum of (5.6) by setting its first deri-

vative with respect to τ to zero, to get

τ − 1 = c − 1

2 M ,

X w

−

(5.11)

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home