Information Technology Reference
In-Depth Information
Lines 15 to 17 update the parameters of the variational posterior q α ( α k ), as
given by (7.40), (7.41), and (7.72). Here, the sum over all squared elements of
W k is used to evaluate j w kj w kj .
The function determines convergence of the parameter updates in Lines 18 to
21 by computing the change of
L k ( q ) over two successive iterations. If this change
drops below the system parameter Δ s L k ( q ), then the function returns. The value
of
L k ( q ) is computed by Function VarClBound , which is described in Sect. 8.1.4.
Its last argument is a vector of responsibilities for classifier k , which is substituted
by the matching function values for reasons mentioned above. Each parameter
update either increases
L k ( q ) or leaves it unchanged, which is specified in Line
21. If this is not the case, then the implementation is faulty and/or suffers from
numerical instabilities. In the experiments that were performed, convergence was
usually reached after 3-4 iterations.
8.1.3
Training the Mixing Model
Training the mixing model is more complex than training the classifiers, as
the IRLS algorithm is used to find the parameters of q V ( V ). The function
TrainMixing takes the model structure, data, and the parameters of the fully
trained classifiers, and returns the parameters of the mixing model.
As with training the classifiers, the parameters of the mixing model are found
incrementally, by sequentially updating the parameters of the variational pos-
teriors q V ( V ), q β ( β )and q Z ( Z ). Convergence of the updates is determined by
Function. TrainMixing( M , X , Y , Φ , W , Λ 1 , a τ , b τ , a α , b α )
Input : matching matrix M , input matrix X , output matrix Y , mixing feature
matrix Φ , classifier parameters W , Λ 1 , a τ , b τ , a α , b α
Output : D V × K mixing weight matrix V ,( KD V ) × ( KD V ) mixing weight
covariance matrix, mixing weight vector prior parameters a β , b β
get D X ,D Y ,D V ,K from shape of X , Y , Φ , W
1
0 , a β
b β
V ← D V
× K matrix with elements sampled from
N
2
a β ←{a β 1 ,...,a β K } , all initialised to a β k = a β
3
b β ←{b β 1 ,...,b β K } , all initialised to b β k = b β
4
L M ( q ) ←−∞
5
ΔL M ( q ) ← Δ s L M ( q )+1
6
while ΔL M ( q ) s L M ( q ) do
7
V , Λ 1
V
TrainMixWeights( M , X , Y , Φ , W , Λ 1 , a τ , b τ , V , a β , b β )
8
a β , b β TrainMixPriors( V , Λ V )
9
G Mixing( M , Φ , V )
10
R Responsibilities( X , Y , G , W , Λ 1 , a τ , b τ )
11
L M,prev ( q ) ←L M ( q )
12
L M ( q ) VarMixBound( G , R , V , Λ V , a β , b β )
13
ΔL M ( q ) ←|L M ( q ) −L M,prev ( q ) |
14
return V , Λ V , a β , b β
15
 
Search WWH ::




Custom Search