Information Technology Reference
In-Depth Information
k x n = i ( x n ) i ( Λ k x n ) i .Thevaluesof
g k ( x n ) are added to ρ nk in Line 6, and the normalisation step by (7.63) is per-
formed in Line 7. For the same reason as in the Mixing function, all NaN values
in R need to be subsequently replaced by 0 to not assign responsibility to any
classifiers for inputs that are not matched.
1
k
, based on x n Λ 1
the rows of X
Function. TrainMixWeights( M , X , Y , Φ , W , Λ 1 , a τ , b τ , V , a β , b β )
Input : matching matrix M , input matrix X , output matrix Y , mixing feature
matrix Φ , classifier parameters W , Λ 1 , a τ , b τ , mixing weight matrix
V , mixing weight prior parameters a β , b β
Output : D V × K mixing weight matrix V ,( KD V ) × ( KD V ) mixing weight
covariance matrix Λ 1
V
E β ( β ) row vector with elements a β 1
b β 1 ,..., a β K
1
b β K
G Mixing( M , Φ , V )
2
R Responsibilities( X , Y , G , W , Λ 1 , a τ , b τ )
3
KL( R G )
←∞
4
Δ KL( R G ) ← Δ s KL( R G )+1
5
while ΔKL ( R G ) s KL ( R G ) do
6
E Φ T ( G R )+ V E β ( β )
7
e ( E 11 ,..., E D V 1 , E 12 ,..., E D V 2 ,..., E 1 K ,..., E D V K ) T
8
H Hessian( Φ , G , a β , b β )
9
Δ v ←− H 1 e
10
Δ V ← D V
× K matrix with jk th element
11
given by (( k − 1) K + j )th element of Δ v
12
V V + Δ V
13
G Mixing( M , Φ , V )
14
R Responsibilities( X , Y , G , W , Λ 1 , a τ , b τ )
15
KL prev ( R G ) KL( R G )
16
KL( R G ) Sum( R FixNaN( ln( G R ) , 0 ))
17
Δ KL( R G )= | KL prev ( R G ) KL( R G ) |
18
H Hessian( Φ , G , a β , b β )
19
Λ 1
V
H 1
20
return V , Λ 1
V
21
The Function TrainMixWeights approximates the mixing weights variational
posterior q V ( V ) (7.51) by performing the IRLS algorithm. It takes the matching
matrix, the data and mixing feature matrix, the trained classifier parameters,
the mixing weight matrix, and the mixing weight prior parameters. As the IRLS
algorithm performs incremental updates of the mixing weights V until conver-
gence, V is not re-initialised every time TrainMixWeights is called, but rather
the previous estimates are used as their initial values to reduce the number of
iterations that are required until convergence.
As the aim is to model the responsibilities by finding mixing weights that
make the mixing coecients given by g k ( x n ) similar to r nk , convergence is
 
Search WWH ::




Custom Search