Information Technology Reference
In-Depth Information
Thus, evaluating (7.26) gives
ln q W,α ( W k k )= D Y a τ
ln τ k
D Y + D X D Y
2
+ D 2
r nk
n
2 D Y b τ +
j
2 w kj
n
τ k
2
r nk y nj
r nk x n y nj
n
E α ( α k ) I +
n
+ w kj
r nk x n x n
w kj
+const.
=ln
j
N
a τ k ,b τ k ) ,
w kj , ( τ k Λ k ) 1 )Gam( τ k |
( w kj |
(7.29)
with the distribution parameters
E α ( α k ) I +
n
Λ k =
r nk x n x n ,
(7.30)
w kj = Λ k 1
n
r nk x n y nj ,
(7.31)
2
n
= a τ + 1
a τ k
r nk ,
(7.32)
j
.
1
2 D Y
b τ k
r nk y nj
w kj T Λ k w kj
= b τ +
(7.33)
n
The second equality in (7.29) can be derived by expanding the final result and
replacing all terms that are independent of W k and τ k by a constant. The dis-
tribution parameter update equations are that of a standard Bayesian weighted
linear regression (for example, [19, 15, 72]).
Note that due to the use of conjugate priors, the variational posterior q W,α
( W k k ) (7.29) has the same distribution form as the prior p ( W k k |
α k )(7.8).
The resulting weight vector w kj , that models the relation between the inputs
and the j th component of the outputs, is given by a Gaussian with mean w kj
and precision τ k Λ k . The same posterior weight mean can be found by minimising
2 R k +
2 ,
Xw kj
y j
E α ( α k )
w kj
(7.34)
with respect to w kj ,where R k is the diagonal matrix R k =diag( r 1 k ,...,r Nk ),
and y j is the vector of j th output elements, y j =( y 1 j ,...,y Nj ) T ,thatis,the j th
column of Y . This shows that we are performing a responsibility-weighted ridge
regression with ridge complexity
E α ( α k ). Thus, the shrinkage is determined by
the prior on α k , as can be expected from the specification of the weight vector
prior (7.8).
The noise precision posterior is the Gamma distribution Gam( τ k |
a τ k ,b τ k ).
νλ
χ ν
νλ
χ ν
Using the relation
Gam( ν/ 2 ,νλ/ 2) , where
is the scaled inverse χ 2
 
Search WWH ::




Custom Search