Information Technology Reference
In-Depth Information
Function.
TrainClassifier(
m
k
,
X
,
Y
)
Input
:matchingvector
m
k
, input matrix
X
, output matrix
Y
Output
:
D
Y
× D
X
weight matrix
W
k
,
D
X
× D
X
covariance matrix
Λ
−
k
,
noise precision parameters
a
τ
k
,b
τ
k
, weight vector prior parameters
a
α
k
,b
α
k
get
D
X
,D
Y
from shape of
X
,
Y
1
X
k
←
X
⊗
√
m
k
2
Y
k
←
Y
⊗
√
m
k
3
a
α
k
,b
α
k
← a
α
,b
α
4
a
τ
k
,b
τ
k
← a
τ
,b
τ
5
L
k
(
q
)
←−∞
6
ΔL
k
(
q
)
← Δ
s
L
k
(
q
)+1
7
while
ΔL
k
(
q
)
>Δ
s
L
k
(
q
)
do
8
E
α
(
α
k
)
← a
α
k
/b
α
k
9
Λ
k
←
E
α
(
α
k
)
I
+
X
k
X
k
10
←
(
Λ
k
)
−
1
Λ
−
1
k
11
W
k
←
Y
k
X
k
Λ
−
1
k
12
← a
τ
+
2
Sum(
m
k
)
a
τ
k
13
Sum(
Y
k
⊗
Y
k
)
−
Sum(
W
k
⊗
W
k
Λ
k
)
1
2
D
b
τ
k
← b
τ
+
14
Y
E
τ
(
τ
k
)
← a
τ
k
/b
τ
k
15
← a
α
+
D
X
D
Y
2
a
α
k
16
← b
α
+
2
E
τ
(
τ
k
)
Sum(
W
k
⊗
W
k
)
+
D
Y
Tr(
Λ
−
k
)
b
α
k
17
L
k,prev
(
q
)
←L
k
(
q
)
18
L
k
(
q
)
←
VarClBound(
X
,
Y
,
W
k
,
Λ
−
k
,a
τ
k
,b
τ
k
,a
α
k
,b
α
k
,
m
k
)
19
ΔL
k
(
q
)
←L
k
(
q
)
−L
k,prev
(
q
)
20
assert
ΔL
k
(
q
)
≥
0
21
return W
k
,
Λ
−
k
,a
τ
k
,b
τ
k
,a
α
k
,b
α
k
22
to (7.97) - (7.100),
L
k
(
q
) is indeed maximised, which is not necessarily the case
if
r
nk
=
m
k
(
x
n
), as discussed in Sect. 7.3.4. Therefore, every parameter update
is guaranteed to increase
L
k
(
q
), until the algorithm converges.
In more detail, Lines 2 and 3 compute the matched input vector
X
k
and output
vector
Y
k
, based on
m
k
(
x
)
m
k
(
x
)=
m
k
(
x
). Note that each column of
X
and
Y
is element-wise multiplied by
√
m
k
, where the square root is applied to each
element of
m
k
separately. The prior and hyperprior parameters are initialised
with their prior parameter values in Lines 4 and 5.
In the actual iteration, Lines 9 to 14 compute the parameters of the varia-
tional posterior
q
∗
W,τ
(
W
k
,τ
k
) by the use of (7.97) - (7.100) and (7.64). To get
the weight vector covariance
Λ
−
1
k
the equality
X
k
X
k
=
n
m
k
(
x
n
)
x
n
x
n
is
used. The weight matrix
W
k
is evaluated by observing that the
j
th row of
Y
k
X
k
Λ
−
1
n
m
k
(
x
n
)
x
n
y
nj
. The update of
, giving
w
kj
,isequivalentto
Λ
−
1
k
k
b
τ
k
uses
Sum
(
Y
k
⊗
Y
k
) that effectively squares each element of
Y
k
before re-
turning the sum over all elements, that is
j
n
m
k
(
x
n
)
y
nj
.
j
w
kj
Λ
k
w
kj
in
(7.100) is computed by observing that it can be reformulated to the sum over
all elements of the element-wise multiplication of
W
k
and
W
k
Λ
k
.
Search WWH ::
Custom Search