An Algorithmic Description - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Lines 15 to 17 update the parameters of the variational posterior q α ( α k ), as

given by (7.40), (7.41), and (7.72). Here, the sum over all squared elements of

W k is used to evaluate j w kj w kj .

The function determines convergence of the parameter updates in Lines 18 to

21 by computing the change of

L k ( q ) over two successive iterations. If this change

drops below the system parameter Δ s L k ( q ), then the function returns. The value

L k ( q ) is computed by Function VarClBound , which is described in Sect. 8.1.4.

Its last argument is a vector of responsibilities for classifier k , which is substituted

by the matching function values for reasons mentioned above. Each parameter

update either increases

L k ( q ) or leaves it unchanged, which is specified in Line

21. If this is not the case, then the implementation is faulty and/or suffers from

numerical instabilities. In the experiments that were performed, convergence was

usually reached after 3-4 iterations.

8.1.3

Training the Mixing Model

Training the mixing model is more complex than training the classifiers, as

the IRLS algorithm is used to find the parameters of q V ( V ). The function

TrainMixing takes the model structure, data, and the parameters of the fully

trained classifiers, and returns the parameters of the mixing model.

As with training the classifiers, the parameters of the mixing model are found

incrementally, by sequentially updating the parameters of the variational pos-

teriors q V ( V ), q β ( β )and q Z ( Z ). Convergence of the updates is determined by

Function. TrainMixing( M , X , Y , Φ , W , Λ − 1 , a τ , b τ , a α , b α )

Input : matching matrix M , input matrix X , output matrix Y , mixing feature

matrix Φ , classifier parameters W , Λ − 1 , a τ , b τ , a α , b α

Output : D V × K mixing weight matrix V ,( KD V ) × ( KD V ) mixing weight

covariance matrix, mixing weight vector prior parameters a β , b β

get D X ,D Y ,D V ,K from shape of X , Y , Φ , W

0 , a β

b β

V ← D V

× K matrix with elements sampled from

a β ←{a β 1 ,...,a β K } , all initialised to a β k = a β

b β ←{b β 1 ,...,b β K } , all initialised to b β k = b β

L M ( q ) ←−∞

ΔL M ( q ) ← Δ s L M ( q )+1

while ΔL M ( q ) >Δ s L M ( q ) do

V , Λ − 1

← TrainMixWeights( M , X , Y , Φ , W , Λ − 1 , a τ , b τ , V , a β , b β )

a β , b β ← TrainMixPriors( V , Λ − V )

G ← Mixing( M , Φ , V )

R ← Responsibilities( X , Y , G , W , Λ − 1 , a τ , b τ )

L M,prev ( q ) ←L M ( q )

L M ( q ) ← VarMixBound( G , R , V , Λ − V , a β , b β )

ΔL M ( q ) ←|L M ( q ) −L M,prev ( q ) |

return V , Λ − V , a β , b β

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home