The Optimal Set of Classifiers - Design and Analysis of Learning Classifier Systems

Information Technology Reference

In-Depth Information

where H is the Hessian matrix of E ( V ) as used in the IRLS algorithm. Overall,

the Laplace approximation to the posterior q V ( V ) is given by the multivariate

Gaussian

V ∗ , Λ V − 1 ) , (7.51)

where V ∗ is the solution to (7.47), and Λ V is the Hessian matrix evaluated at

V ∗ .

q V ( V )

≈N

( V

Mixing Weight Priors q β

7.3.5

( β )

By (7.19), p ( β ) factorises with respect to k , and thus allows us to find q β ( β )

for each classifier separately, which, by (7.15), (7.18) and (7.24), requires the

evaluation of

ln q β ( β k )=

E V (ln p ( v k |

β k )) + ln p ( β k ) .

(7.52)

Using (7.13) and (7.14), the expectation and log-density are given by

D V

β k

2 E V ( v k v k )+const. ,

E V (ln p ( v k |

β k )) =

ln β k −

(7.53)

ln p ( β k )=( a β −

1) ln β k −

β k b β + const.

(7.54)

Combining the above, we get the variational posterior

ln q β ( β k )= a β −

ln β k −

b β + 1

2 E V ( v k v k ) b β +const.

1+ D V

a β k ,b β k ) ,

=lnGam( β k |

(7.55)

with the distribution parameters

= a β + D V

a β k

(7.56)

= b β + 1

b β k

2 E V ( v k v k ) .

(7.57)

As the priors on v k are similar to the ones on w k , they cause the same effect:

as b β k increases proportionally to the expected size

2 , the expectation of the

v k

E β ( β k )= a β k /b β k decreases in proportion to it. This expectation de-

termines the shrinkage on v k (see (7.47)), and thus, the strength of the shrinkage

prior is reduced if v k is expected to have large elements, which is an intuitively

sensible procedure.

posterior

Latent Variables q Z

( Z )

7.3.6

To get the variational posterior over the latent variables Z we need to evaluate

(7.24) by the use of (7.15), that is,

ln q Z ( Z )=

E W,τ (ln p ( Y

W , τ , Z )) +

E V (ln p ( Z

V )) + const.

(7.58)

Design and Analysis of Learning Classifier Systems

Search WWH ::

Custom Search

Home