An Algorithmic Description - Design and Analysis of Learning Classifier Systems - page 166

Information Technology Reference

In-Depth Information

Function. TrainClassifier( m k , X , Y )

Input :matchingvector m k , input matrix X , output matrix Y

Output : D Y × D X weight matrix W k , D X × D X covariance matrix Λ − k ,

noise precision parameters a τ k ,b τ k , weight vector prior parameters

a α k ,b α k

get D X ,D Y

from shape of X , Y

1

X k ← X ⊗ √ m k

2

Y k ← Y ⊗ √ m k

3

a α k ,b α k ← a α ,b α

4

a τ k ,b τ k ← a τ ,b τ

5

L k ( q ) ←−∞

6

ΔL k ( q ) ← Δ s L k ( q )+1

7

while ΔL k ( q ) >Δ s L k ( q ) do

8

E α ( α k ) ← a α k /b α k

9

Λ k ← E α ( α k ) I + X k X k

10

← ( Λ k ) − 1

Λ − 1

k

11

W k ← Y k X k Λ − 1

k

12

← a τ + 2 Sum( m k )

a τ k

13

Sum( Y k ⊗ Y k ) − Sum( W k ⊗ W k Λ k )

1

2 D

b τ k

← b τ +

14

Y

E τ ( τ k ) ← a τ k /b τ k

15

← a α + D X D Y

2

a α k

16

← b α + 2 E τ ( τ k ) Sum( W k ⊗ W k ) + D Y Tr( Λ − k )

b α k

17

L k,prev ( q ) ←L k ( q )

18

L k ( q ) ← VarClBound( X , Y , W k , Λ − k ,a τ k ,b τ k ,a α k ,b α k , m k )

19

ΔL k ( q ) ←L k ( q ) −L k,prev ( q )

20

assert ΔL k ( q ) ≥ 0

21

return W k , Λ − k ,a τ k ,b τ k ,a α k ,b α k

22

to (7.97) - (7.100),

L k ( q ) is indeed maximised, which is not necessarily the case

if r nk

= m k ( x n ), as discussed in Sect. 7.3.4. Therefore, every parameter update

is guaranteed to increase

L k ( q ), until the algorithm converges.

In more detail, Lines 2 and 3 compute the matched input vector X k and output

vector Y k , based on m k ( x ) m k ( x )= m k ( x ). Note that each column of X and

Y is element-wise multiplied by √ m k , where the square root is applied to each

element of m k separately. The prior and hyperprior parameters are initialised

with their prior parameter values in Lines 4 and 5.

In the actual iteration, Lines 9 to 14 compute the parameters of the varia-

tional posterior q ∗ W,τ ( W k ,τ k ) by the use of (7.97) - (7.100) and (7.64). To get

the weight vector covariance Λ − 1 k the equality X k X k = n m k ( x n ) x n x n is

used. The weight matrix W k is evaluated by observing that the j th row of

Y k X k Λ − 1

n m k ( x n ) x n y nj . The update of

, giving w kj ,isequivalentto Λ − 1

k

k

b τ k uses Sum ( Y k ⊗

Y k ) that effectively squares each element of Y k before re-

turning the sum over all elements, that is j n m k ( x n ) y nj . j w kj Λ k w kj in

(7.100) is computed by observing that it can be reformulated to the sum over

all elements of the element-wise multiplication of W k and W k Λ k .

Next Page

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home