An Algorithmic Description - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

Function. VarMixBound( G , R , V , Λ − V , a β , b β )

Input : mixing matrix G , responsibilities matrix R , mixing weight matrix V ,

mixing covariance matrix Λ − V mixing weight prior parameters a β , b β

Output : mixing component L M ( q ) of variational bound

get D V ,K from shape of V

L M, 1 ( q ) ← K ( − ln Γ( a β )+ a β ln b β )

for k =1 to K do

a β k ,b β k ← pick from a β , b β

L M, 1 ( q ) ←L M, 1 ( q )+lnΓ( a β k ) − a β k ln b β k

L M, 2 ( q ) ← Sum( R ⊗ FixNaN( ln( G R ) , 0 ))

| + KD V

2 ln | Λ − 1

L M, 3 ( q ) ←

return

L M, 1 ( q )+ L M, 2 ( q )+ L M, 3 ( q )

L M ( q )of

( q ) by evaluating (7.95). As in VarClBound , the computation of

L M ( q )

is split into the components

L M, 1 ( q ),

L M, 2 ( q ), and

L M, 3 ( q ), whose sum is retur-

ned.

L M ( q ) that depend on the parameters

q β ( β ), and is computed in Lines 2 to 5 by iterating over all k .

L M, 1 ( q ) contains the components of

L M, 2 ( q )isthe

Kullback-Leibler divergence KL( R

G ), as given by (7.84), which is computed

in the same way as in Line 17 of Function TrainMixWeights .

8.1.5

Scaling Issues

Let us now consider how the presented algorithm scales with the dimensionality

of the input space D X , output space D Y , the mixing feature space D V ,the

number N of observations that are available, and the number K of classifiers. All

(

) are based on the observation that the multiplication of an a

b matrix with

a b

c matrix scales with

( abc ), and the inversion and getting the determinant

( a 2 ), respectively.

Table 8.1 gives an overview of how the different functions scale with N , K ,

D X , D Y and D V . Unfortunately, even though ModelProbability scales linearly

with N and D Y , it neither scales well with D X ,norwith K and D V .Inallthree

cases, the 3rd polynomial is caused by a matrix inversion.

Considering that D 3

( a 3 )and

of an a

a matrix have complexity

is due to inverting the precision matrix Λ k ,itmightbe

reducible to D 2

by using the Sherman-Morrison formula, as shown in Sect. 5.3.5.

D X is the dimensionality of the input space with respect to the classifier model,

and is given by D X = 1 for averaging classifiers, and by D X = 2 for classifiers

that model straight lines. Thus, it is in general not too high and D 3

will not

be the most influential complexity component. In any case, as long as we are

required to maintain a covariance matrix Λ − 1

of size D X ×

D X , the influence of

D X is unlikely to be reducible below D 2

The biggest weakness of the prototype algorithm that was presented here is

that the number of operations required to find the parameters of the mixing

model scale with K 3 D V . This is due to the inversion of the ( KD V )

( KD V )

Hessian matrix that is required at each iteration of the IRLS algorithm. To apply

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home