Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Theorem 5.7 (Incremental Sum of Squared Error Update). Let the se-

quence of weight vector estimates

{

w 1 , w 2 ,...

}

satisfy the Principle of Ortho-

gonality (5.16). Then

M N +1

X N +1 w N +1 −

y N +1

(5.68)

M N + m ( x N +1 )( w N x N +1 −

y N +1 )( w N +1 x N +1 −

X N w N − y N

y N +1 )

holds.

An almost equal derivation reveals that the sum of squared errors for the recency-

weighted RLS variant is given by

M N +1

X N +1 w N +1 −

y N +1

= λ m ( x N +1 )

2 M N

X N w N −

y N

+ m ( x N +1 )( w N x N +1 −

y N +1 )( w N +1 x N +1 −

y N +1 ) ,

(5.69)

where, when compared to (5.68), the current sum of squared errors is additionally

discounted.

In summary, the unbiased noise precision estimate can be tracked by directly

solving (5.63), where the match count is updated by

c N +1 = c N + m ( x N +1 ) ,

(5.70)

and the sum of squared errors is updated by (5.68). As Theorem 5.7 states,

(5.68) is only valid if the Principle of Orthogonality holds. However, using the

computationally cheaper RLS implementation that involves (5.35) introduces an

initial bias and hence violates the Principle of Orthogonality. Nonetheless, if δ in

Λ − 0 = δ I is set to a very large positive scalar, this bias is negligible, and hence

(5.68) is still applicable with only minor inaccuracy.

Example 5.8 (Noise Precision Estimation for Averaging Classifiers). Consider

averaging classifiers, such that x n =1forall n> 0. Given the use of gradient-

based methods to estimate the weight vector violates the Principle of Orthogo-

nality, and hence (5.65) has to be used estimate the noise precision, resulting in

N +1 = τ − N + m ( x N +1 ) ( w N +1 −

τ − N .

τ − 1

y N +1 ) 2

−

(5.71)

Alternatively, we can use the RLS algorithm (5.46) for averaging classifiers, and

use (5.68) to accurately track the noise precision by

τ − 1

N +1 = τ − 1

+ m ( x N +1 )( w N −

y N +1 )( w N +1 −

y N +1 ) .

(5.72)

Note that while the computational cost of both approaches is equal (in its ap-

plication to averaging classifiers), the second approach is vastly superior in its

weight vector and noise precision estimation accuracy and should therefore be

always preferred.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home