Training the Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

is the average over all matched outputs, it becomes apparent that the LMS

algorithm does not achieve this optimum for arbitrary step sizes. Nonetheless,

it can be applied readily for recency-weighted applications, such as to handle

non-stationary processes, as is required in reinforcement learning applications.

5.3.4

Normalised Least Mean Squared

As seen from (5.25), the magnitude of the weight update is directly proportio-

nal to the new input vector x N +1 ,causing gradient noise amplification [105,

Chap. 6]. Thus, if some elements of the input vector are large, the correction

based on a local error will be amplified and causes additional noise. This problem

can be overcome by weighting the correction by the squared Euclidean norm of

the input, resulting in the update

x N +1

w N x N +1 ) .

w N +1 = w N + γ t m ( x N +1 )

2 ( y N +1 −

(5.29)

x N +1

This update equation can also be derived by calculating the weight vector update

that minimises the norm of the weight change

2 , subject to the

constraint m ( x N +1 ) w N +1 x N +1 = y N +1 . As such, the normalised LMS filter can

be seen as a solution to a constrained optimisation problem.

Regarding stability, the step size parameter γ is now weighted by the inverted

square norm of the input vector. Thus, stability in the MSE sense is dependent

on the current input. The lower bound is still 0, and the upper bound will be

generally larger than 2 if the input values are overestimated, and smaller than

2 otherwise. The optimal step size, located at the largest value of the mean

squared deviation, is the centre of the two bounds [105, Chap. 6].

As expected, the normalised LMS algorithm features a rate of convergence

that is higher than that of the standard LMS filter, as empirically demonstrated

by Douglas [76]. One drawback of the modification is that one needs to check

w N +1 −

w N

2 for being zero, in which case no update needs to be performed to avoid

division by zero.

To summarise, both variants of the LMS algorithm have low computational

and space costs

x N +1

( D X ), but only rely on the local gradient estimate and may

hence feature slow convergence and misadjustment. The step size can be ad-

justed to either improve convergence speed or misadjustment, but one cannot

improve both at the same time. Additionally, the speed of convergence is by

(5.21) influenced by the value of the inputs and might be severely reduced by

ill-conditioned inputs, as will be demonstrated in Sect. 5.4.

Let us recall that to quickly getting an idea of the goodness-of-fit of a classifier

model, measured by the model variance (5.13), requires a good weight vector esti-

mate. Despite their low computational cost, gradient-based methods are known

to suffer from low speed of convergence and are therefore not necessarily the best

choice for this task. The following sections describe incremental methods that

are computationally more costly, but are able to recursively track the weight

vector that satisfies (5.16) and are therefore optimal in the least squares sense.

O

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home