Information Technology Reference
In-Depth Information
to a certain range, with the following underlying idea [173]: they are limited from
below by exp min to ensure that their exponential is positive, as their logarithm
might be later taken. Additionally, they are limited from above by ln max
ln K
such that summing over K such elements does not cause an overflow. Once this is
done, the element-wise exponential can be taken, and each element is multiplied
by the corresponding matching function value, as done in Line 4. This essentially
gives the nominator of (7.10) for all combinations of n and k . Normalisation over
k is performed in the next line by dividing each element in a certain row by the
element sum of this row. If rows in G were zero before normalisation, 0 / 0was
performed, which is fixed in Line 6 by assigning equal weights to all classifiers
for inputs that are not matched by any classifier. Usually, this should never
happen as only model structures are accepted where k m k ( x n ) > 0 for all n .
Nonetheless, this check was added to ensure that even these cases are handled
gracefully.
Function. Responsibilities( X , Y , G , W , Λ 1 , a τ , b τ )
Input : input matrix X , output matrix Y , gating matrix G , classifier
parameters W , Λ 1 , a τ , b τ
Output : N × K responsibility matrix R
get K, D Y
from shape of Y , G
1
for k =1 to K do
2
W k , Λ k ,a τ k ,b τ k
pick from W , Λ 1 , a τ , b τ
3
k th column of R exp D 2 (ψ( a τ k ) ln b τ k )
4
RowSum( ( Y XW k T ) 2 ) + D Y RowSum( X k )
2 a τ k
1
5
b τ k
R R G
6
R R RowSum( R )
7
FixNaN( R , 0 )
8
return R
9
Based on the gating matrix G and the goodness-of-fit of the classifiers, the
Function Responsibilities computes the N
K responsibility matrix, with
r nk as its nk th element. Its elements are evaluated by following (7.62), (7.63),
(7.69) and (7.74).
The loop from Line 2 to 5 in Responsibilities iterates over all k to fill the
columns of R with the values for ρ nk according to (7.62), but without the term
×
g k ( x n ) 1 . This is simplified by observing that the term j ( y nj
w kj x n ) 2 ,which
is by (7.74) part of j E W,τ ( τ k ( y nj w kj x n ) 2 ), is given for each observation se-
parately in the vector that results from summing over the rows of ( Y
XW k ) 2 ,
where the square is taken element-wise. Similarly, x n Λ k x n of the same expec-
tation is given for each observation by the vector that results from summing over
1 Note that we are operating on ρ nk rather than ln ρ nk , as given by (7.62), as we cer-
tainly have g k ( x n ) = 0 in cases where m k ( x n ) = 0, which would lead to subsequent
numerical problems when evaluating ln g k ( x n ).
 
Search WWH ::




Custom Search