Information Technology Reference
In-Depth Information
Function. TrainClassifier( m k , X , Y )
Input :matchingvector m k , input matrix X , output matrix Y
Output : D Y × D X weight matrix W k , D X × D X covariance matrix Λ k ,
noise precision parameters a τ k ,b τ k , weight vector prior parameters
a α k ,b α k
get D X ,D Y
from shape of X , Y
1
X k X m k
2
Y k Y m k
3
a α k ,b α k ← a α ,b α
4
a τ k ,b τ k ← a τ ,b τ
5
L k ( q ) ←−∞
6
ΔL k ( q ) ← Δ s L k ( q )+1
7
while ΔL k ( q ) s L k ( q ) do
8
E α ( α k ) ← a α k /b α k
9
Λ k E α ( α k ) I + X k X k
10
( Λ k ) 1
Λ 1
k
11
W k Y k X k Λ 1
k
12
← a τ + 2 Sum( m k )
a τ k
13
Sum( Y k Y k ) Sum( W k W k Λ k )
1
2 D
b τ k
← b τ +
14
Y
E τ ( τ k ) ← a τ k /b τ k
15
← a α + D X D Y
2
a α k
16
← b α + 2 E τ ( τ k ) Sum( W k W k ) + D Y Tr( Λ k )
b α k
17
L k,prev ( q ) ←L k ( q )
18
L k ( q ) VarClBound( X , Y , W k , Λ k ,a τ k ,b τ k ,a α k ,b α k , m k )
19
ΔL k ( q ) ←L k ( q ) −L k,prev ( q )
20
assert ΔL k ( q ) 0
21
return W k , Λ k ,a τ k ,b τ k ,a α k ,b α k
22
to (7.97) - (7.100),
L k ( q ) is indeed maximised, which is not necessarily the case
if r nk
= m k ( x n ), as discussed in Sect. 7.3.4. Therefore, every parameter update
is guaranteed to increase
L k ( q ), until the algorithm converges.
In more detail, Lines 2 and 3 compute the matched input vector X k and output
vector Y k , based on m k ( x ) m k ( x )= m k ( x ). Note that each column of X and
Y is element-wise multiplied by m k , where the square root is applied to each
element of m k separately. The prior and hyperprior parameters are initialised
with their prior parameter values in Lines 4 and 5.
In the actual iteration, Lines 9 to 14 compute the parameters of the varia-
tional posterior q W,τ ( W k k ) by the use of (7.97) - (7.100) and (7.64). To get
the weight vector covariance Λ 1 k the equality X k X k = n m k ( x n ) x n x n is
used. The weight matrix W k is evaluated by observing that the j th row of
Y k X k Λ 1
n m k ( x n ) x n y nj . The update of
, giving w kj ,isequivalentto Λ 1
k
k
b τ k uses Sum ( Y k
Y k ) that effectively squares each element of Y k before re-
turning the sum over all elements, that is j n m k ( x n ) y nj . j w kj Λ k w kj in
(7.100) is computed by observing that it can be reformulated to the sum over
all elements of the element-wise multiplication of W k and W k Λ k .
 
Search WWH ::




Custom Search