Mixing Independently Trained Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

each of these problems operate on an input space of dimensionality D V ,and

hence, using the least squares methods introduced in the previous chapter, have

either complexity

( KD V ) for each step

of the incremental solution. Given that we usually have D V =1inLCS,thisis

certainly an appealing property.

When minimising (6.15) it is essential to consider that the values for r nk by

(6.4) depend on the current v k of all classifiers. Consequently, when performing

batch learning, it is not sucient to solve all K least squares problems only

once, as the corresponding targets change with the updated values of V .Thus,

again one needs to repeatedly update the estimate V until the cross-entropy

(6.6) converges.

On the other hand, using recursive least squares to provide an incremental

approximation of V we need to honour the non-stationarity of the target values

by using the recency-weighted RLS variant. Hence, according to Sect. 5.3.5 the

update equations take the form

( NKD V ) for the batch solution or

v kN +1 = λ m k ( x n ) v kN

(6.16)

kN +1 φ ( x N +1 ) ln

v kN φ ( x N +1 ) T ,

r nk

m k ( x n ) −

+ m k ( x N +1 ) Λ − 1

Λ − 1

kN +1 = λ −m ( x N +1 ) Λ − 1

(6.17)

Λ − 1

kN φ ( x N +1 ) φ ( x N +1 ) T Λ − 1

m ( x N +1 ) λ −m ( x N +1 )

λ m k ( x n ) + m k ( x N +1 ) φ ( x N +1 ) T Λ − 1

−

kN φ ( x N +1 ) ,

where the v k 's and Λ − k 's are initialised to v k 0 = 0 and Λ − 1

k 0 = δ I for all k ,

with δ being a large scalar. In [121], Jordan and Jacobs initially set λ =0 . 99

and increased a fixed fraction (0 . 6) of the remaining distance to 1 . 0 every 1000

updates. This seems a sensible approach to start with, but further empirical

experience is required to make definite recommendations.

As pointed out by Jordan and Jacobs [121], approximating the values of V by

least squares does not result in the same parameter estimates as when using the

IRLS algorithm, due to the use of least squares rather than maximum likelihood.

In fact, the least squares approach can be seen as an approximation to the

maximum likelihood solution under the assumption that the residual in (6.15) in

small, which is equivalent to assuming that the LCS model can fit the underlying

regression surface and that the noise is small. Nonetheless, they demonstrate

empirically that the least squares approach provides good results even when the

residual is large in the early stages of training [121]. In any case, in terms of

complexity it is a very appealing alternative to the IRLS algorithm.

6.2

Heuristic-Based Mixing Models

While the IRLS algorithm minimises (6.6), it does not scale well with the number

of classifiers. The least squares approximation, on the other hand, scales well,

but minimises (6.15) instead of (6.6), which does not always give good results,

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home