Information Technology Reference
In-Depth Information
with
= a α + D X D Y
2
a α k
,
(7.40)
2
= b α + 1
b α k
j E W,τ ( τ k w kj w kj ) .
(7.41)
Utilising again the relation between the gamma distribution and the scaled in-
verse χ 2 distribution, (7.40) increments the initial 2 a α degrees of freedom by
D X D Y , which is the number of elements in W k .
The posterior mean of α k is
( α k )= a α k /b α k and thus is inversely proportio-
nal to the size of the weight vectors
E
2 = w kj w kj and the noise precision
τ k . As the element-wise variance in the weight vector prior (7.8) is given by
( α k τ k ) 1 , the effect of τ k on that prior is diminished. Thus, the weight vector
prior variance is proportional to the expected size of the weight vectors, which
has the effect of spreading the weight vector prior if the weight vector is expected
to be large, effectively reducing the shrinkage. Intuitively, this is a sensible thing
to do, as one should refrain from using an overly strong shrinkage prior if the
weight vector is expected to have large elements.
w kj
Mixing Model q V
7.3.4
( V )
We get the variational posterior q V ( V ) on the mixing model parameters by
solving (7.24) with (7.15), that is
ln q V ( V )=
E Z (ln p ( Z
|
V )) +
E β (ln p ( V
|
β )) + const. .
(7.42)
Even though q V factorises with respect to k , we will solve it for all classifiers
simultaneously due to the Laplace approximation that is applied thereafter.
Evaluating the expectations by using (7.12), (7.13) and (7.19) we get
V )) =
n
E Z (ln p ( Z
|
r nk g k ( x n ) ,
(7.43)
k
β )) =
k
0 1
k
E β (ln p ( V
|
E β (ln
N
( v k |
I ))
v k v k +const. ,
=
k
E β ( β k )
2
(7.44)
where r nk E Z ( z nk ) was used. Thus, the variational log-posterior evaluates to
r nk g k ( x n ) + const.
ln q V ( V )=
k
v k v k +
n
E β ( β k )
2
(7.45)
Note that the distribution form of this posterior differs from its prior (7.13),
which would cause problems in further derivations. Thus, we proceed the same
 
Search WWH ::




Custom Search