Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

which demonstrates that the larger the values of i n , the smaller the step size has

to be to still guarantee stability of the algorithm. The time constant is bounded

−

γ (1 + N − 1 i n )) ≤

≤∞

(5.24)

−

ln(1

showing that a large eigenvalue spread

caused by on average high

magnitudes of i n pushes the time constant towards infinity, which results in very

slow convergence. Therefore, the convergence rate of steepest gradient descent

depends frequently on the range of the inputs 3 . This dependency is demonstrated

empirically in Sect. 5.4.

λ 2 −

λ 1 |

5.3.3

Least Mean Squared

The Least Mean Squared (LMS) algorithm is an incremental approximation to

steepest gradient descent. Rather than performing gradient descent on the error

function given all observations, it follows the gradient of the error function given

only the current observation. For this reason, it is also known as Stochastic

Incremental Steepest Gradient Descent , ADALINE , or, after their developers

Widrow and Hoff [234], the Widrow-Hoff Update .

By inspecting (5.5), the error function for the ( N + 1)th observation based on

the model after N observations is m ( x N +1 )( w N x N +1 −

y N +1 ) 2 , and its gradient

with respect to w N is therefore 2 m ( x N +1 ) x N +1 ( w N x N +1 −

y N +1 ). Using this

local gradient estimate as a surrogate for the global gradient, the LMS update

is given by

w N x N +1 ) ,

w N +1 = w N + γ N +1 m ( x N +1 ) x N +1 ( y N +1 −

(5.25)

starting with an arbitrary w 0 .

As the gradient estimate is only based on the current input, the method suffers

from gradient noise . Due to this noise, a constant step size γ will cause random

motion close to the optimal approximation [105, Chap. 5].

Misadjustment Due to Local Gradient Estimate

Let h N ( w )= c − N

2 be the mean squared error (MSE) after N ob-

servations as a function of the weight vector. The excess mean square estimation

error is the difference between the MSE of the LMS algorithm and the minimal

MSE after (5.16). The ratio between the excess MSE and the minimal MSE error

is the misadjustment , which is a measure of how far away the convergence area

of LMS is from the optimal estimate. The estimate error for some small constant

step size can, according to [105, Chap. 5], be estimated by

X N w

−

y N

h N ( w ∗ N )+ γh N ( w ∗ N )

λ j ,

(5.26)

j =1

3 A similar LCS-related analysis was done by Lanzi et al. [142, 143], but there the

stability criteria for steepest gradient descent were applied to the LMS algorithm.

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home