An Algorithmic Description - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

determined by the Kullback-Leibler divergence measure KL( R

G )thatmea-

sures the distance between the probability distributions given by R and G .

Formally, it is defined by KL( R

G )= n k r nk ln( g k ( x n ) /r nk ), and is repre-

sented in

− E Z (ln q ( Z )), given by

(7.84). As the Kullback-Leibler divergence is non-negative and zero if and only

if R = G [232], the algorithm assumes convergence of the IRLS algorithm if

the change in KL( R

L M ( q ) (7.95) by the terms

E Z,V (ln p ( Z

V )

G ) between two successive iterations is below the system

parameter Δ s KL( R G ).

TrainMixWeights starts by computing the expectation E β ( β k ) for all k in Line

1. The IRLS iteration (6.5) requires the error gradient

E ( V ) and the Hessian

H , which are by (7.48) and (7.49) based on the values of g k ( x n )and r nk . Hence,

TrainMixWeights continues by computing G and R in Lines 2 and 3.

The error gradient

∇

E ( V ) by (7.48) is evaluated in Lines 7 and 8. Line 7

uses the fact that Φ T ( G

∇

−

R )resultsina D V

K matrix that has the vector

n ( g j ( x n )

−

r nj ) φ ( x n )asits j th column. Similarly, V

⊗ E β ( β )resultsinamatrix

of the same size, with

E β ( β j ) v j as its j th column. Line 8 rearranges the matrix

E , which has

E ( V ).

The Hessian H is assembled in Line 9 by calling the Function Hessian ,andis

used in the next line to compute the vector Δ v by which the mixing weights

need to be changed according to the IRLS algorithm (6.5). The mixing weight

vector is updated by rearranging Δ v to the shape of V in Line 12, and adding

it to V in the next line.

As the mixing weights have changed, G and R are recomputed with the

updated weights, to get KL( R

∇ v j E ( V )asits j th column, to the gradient vector e =

∇

G ), and eventually to use it in the next iteration.

The Kullback-Leibler divergence between the responsibilities R and their model

G are evaluated in Line 17, and then compared to its value of the last iteration to

determine convergence of the IRLS algorithm. Note that due to the use of matrix

operations, the elements in R are not checked for being r nk = 0 due to g k ( x )=0

when computing G

R , which might cause NaN entries in the resulting matrix.

Even though these entries are multiplied by r nk = 0 thereafter, they firstly need

to be replaced by zero, as otherwise we would still get 0

NaN = NaN .

The IRLS algorithm gives the mean of q V ( V ) by the mixing weights that

minimise the error function E ( V ). The covariance matrix Λ − V still needs to be

evaluated and is by (7.50) the inverse Hessian, as evaluated in Line 19. Due to

its dependence on G , the last Hessian in the IRLS iteration in Line 9 cannot be

used for that purpose, as G has changed thereafter.

To complete TrainMixWeights , let us consider how the Function Hessian

assembles the Hessian matrix H :itfirstcreatesanempty( KD V )

( KD V )

matrix that is thereafter filled by its block elements H kj = H jk ,asgivenby

(7.49). Here, the equality

φ ( x n ) g k ( x n ) g j ( x n ) φ ( x n ) T = Φ T ( Φ

⊗

( g k ⊗

g j ))

(8.3)

is used for the off-diagonal blocks of H where I kj = 0 in (7.49), and a similar

relation is used to get the diagonal blocks of H .

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home