Information Technology Reference
In-Depth Information
Function. ModelProbability( M , X , Y , Φ )
Input : matching matrix M , input matrix X , output matrix Y , mixing feature
matrix Φ
Output : approximate model probability
L
( q )+ln p (
M
)
get K from shape of M
1
for k ← 1 to K do
2
m k ← k th column of M
3
W k , Λ k 1 ,a τ k ,b τ k ,a α k ,b α k
TrainClassifier( m k , X , Y )
4
W , Λ 1
←{ W 1 ,..., W K }, { Λ 1 ,..., Λ K
}
5
a τ , b τ
←{a τ 1 ,...,a τ K }, {b τ 1 ,...,b τ K }
6
a α , b α ←{a α 1 ,...,a α K }, {b α 1 ,...,b α K }
7
V , Λ V a β , b β TrainMixing( M , X , Y , Φ , W , Λ 1 , a τ , b τ , a α , b α )
8
θ ←{ W , Λ 1 , a τ , b τ , a α , b α , V , Λ 1
V
a β , b β }
9
L ( q ) VarBound( M , X , Y , Φ , θ )
10
return L ( q )+ln K !
11
ln p (
M|D
). Thus, it replaces the model evidence p (
D|M
) in (7.3) by its appro-
ximation
( q ). The function assumes that the order of the classifiers can be
arbitrarily permuted without changing the model structure and therefore uses
the p (
L
), the function does not
add the normalisation constant. Hence, even though the return values are not
proper probabilities, they can still be used for the comparison of different model
structures, as the normalisation term is shared between all of them.
The computation of
M
) given by (7.4). In approximating ln p (
M|D
L
( q )+ln p (
M
) is straightforward: Lines 2 to 7 compute
and assemble the parameters of the classifiers by calling TrainClassifier for
each classifier k separately, and provide it with the data and the matching vector
m k for that classifier. After that, the mixing model parameters are computed in
Line 8 by calling TrainMixing , based on the fully trained classifiers.
Having evaluated all classifiers, all parameters are collected in Line 9 to give
θ and are used in Line 10 to compute
L
( q ) by calling VarBound .Afterthat,the
function returns
L
( q )+ln K !, based on (7.3) and (7.4).
8.1.2
Training the Classifiers
The Function TrainClassifier takes the data X , Y and the matching vector
m k and returns all model parameters for the trained classifier k . The model
parameters are found by iteratively updating the distribution parameters of the
variational posteriors q W,τ ( W k k )and q α ( α k ) until the convergence criterion
is satisfied. This criterion is given by the classifier-specific components
L k ( q )of
the variational bound
( q ), as given by (7.91). However, rather than evaluating
L k ( q ) with the responsibilities r nk , as done in (7.91), the matching function
m k ( x n ) are used instead. The underlying idea is that - as each classifier is trai-
ned independently - the responsibilities are equivalent to the matching function
values. This has the effect that by updating the classifier parameters according
L
Search WWH ::




Custom Search