Information Technology Reference
In-Depth Information
Thus, evaluating (7.26) gives
ln
q
∗
W,α
(
W
k
,τ
k
)=
D
Y
a
τ
−
ln
τ
k
D
Y
+
D
X
D
Y
2
+
D
2
r
nk
n
2
D
Y
b
τ
+
j
2
w
kj
n
τ
k
2
r
nk
y
nj
−
−
r
nk
x
n
y
nj
n
E
α
(
α
k
)
I
+
n
+
w
kj
r
nk
x
n
x
n
w
kj
+const.
=ln
j
N
a
τ
k
,b
τ
k
)
,
w
kj
,
(
τ
k
Λ
k
)
−
1
)Gam(
τ
k
|
(
w
kj
|
(7.29)
with the distribution parameters
E
α
(
α
k
)
I
+
n
Λ
k
=
r
nk
x
n
x
n
,
(7.30)
w
kj
=
Λ
k
−
1
n
r
nk
x
n
y
nj
,
(7.31)
2
n
=
a
τ
+
1
a
τ
k
r
nk
,
(7.32)
⎛
⎝
j
⎞
⎠
.
1
2
D
Y
b
τ
k
r
nk
y
nj
−
w
kj
T
Λ
k
w
kj
=
b
τ
+
(7.33)
n
The second equality in (7.29) can be derived by expanding the final result and
replacing all terms that are independent of
W
k
and
τ
k
by a constant. The dis-
tribution parameter update equations are that of a standard Bayesian weighted
linear regression (for example, [19, 15, 72]).
Note that due to the use of conjugate priors, the variational posterior
q
∗
W,α
(
W
k
,τ
k
) (7.29) has the same distribution form as the prior
p
(
W
k
,τ
k
|
α
k
)(7.8).
The resulting weight vector
w
kj
, that models the relation between the inputs
and the
j
th component of the outputs, is given by a Gaussian with mean
w
kj
and precision
τ
k
Λ
k
. The same posterior weight mean can be found by minimising
2
R
k
+
2
,
Xw
kj
−
y
j
E
α
(
α
k
)
w
kj
(7.34)
with respect to
w
kj
,where
R
k
is the diagonal matrix
R
k
=diag(
r
1
k
,...,r
Nk
),
and
y
j
is the vector of
j
th output elements,
y
j
=(
y
1
j
,...,y
Nj
)
T
,thatis,the
j
th
column of
Y
. This shows that we are performing a responsibility-weighted ridge
regression with ridge complexity
E
α
(
α
k
). Thus, the shrinkage is determined by
the prior on
α
k
, as can be expected from the specification of the weight vector
prior (7.8).
The noise precision posterior is the Gamma distribution Gam(
τ
k
|
a
τ
k
,b
τ
k
).
νλ
χ
ν
νλ
χ
ν
Using the relation
∼
Gam(
ν/
2
,νλ/
2) , where
is the scaled inverse
χ
2
Search WWH ::
Custom Search