Information Technology Reference
In-Depth Information
determined by the Kullback-Leibler divergence measure KL( R
G )thatmea-
sures the distance between the probability distributions given by R and G .
Formally, it is defined by KL( R
G )= n k r nk ln( g k ( x n ) /r nk ), and is repre-
sented in
E Z (ln q ( Z )), given by
(7.84). As the Kullback-Leibler divergence is non-negative and zero if and only
if R = G [232], the algorithm assumes convergence of the IRLS algorithm if
the change in KL( R
L M ( q ) (7.95) by the terms
E Z,V (ln p ( Z
|
V )
G ) between two successive iterations is below the system
parameter Δ s KL( R G ).
TrainMixWeights starts by computing the expectation E β ( β k ) for all k in Line
1. The IRLS iteration (6.5) requires the error gradient
E ( V ) and the Hessian
H , which are by (7.48) and (7.49) based on the values of g k ( x n )and r nk . Hence,
TrainMixWeights continues by computing G and R in Lines 2 and 3.
The error gradient
E ( V ) by (7.48) is evaluated in Lines 7 and 8. Line 7
uses the fact that Φ T ( G
R )resultsina D V
×
K matrix that has the vector
n ( g j ( x n )
r nj ) φ ( x n )asits j th column. Similarly, V
E β ( β )resultsinamatrix
of the same size, with
E β ( β j ) v j as its j th column. Line 8 rearranges the matrix
E , which has
E ( V ).
The Hessian H is assembled in Line 9 by calling the Function Hessian ,andis
used in the next line to compute the vector Δ v by which the mixing weights
need to be changed according to the IRLS algorithm (6.5). The mixing weight
vector is updated by rearranging Δ v to the shape of V in Line 12, and adding
it to V in the next line.
As the mixing weights have changed, G and R are recomputed with the
updated weights, to get KL( R
v j E ( V )asits j th column, to the gradient vector e =
G ), and eventually to use it in the next iteration.
The Kullback-Leibler divergence between the responsibilities R and their model
G are evaluated in Line 17, and then compared to its value of the last iteration to
determine convergence of the IRLS algorithm. Note that due to the use of matrix
operations, the elements in R are not checked for being r nk = 0 due to g k ( x )=0
when computing G
R , which might cause NaN entries in the resulting matrix.
Even though these entries are multiplied by r nk = 0 thereafter, they firstly need
to be replaced by zero, as otherwise we would still get 0
NaN = NaN .
The IRLS algorithm gives the mean of q V ( V ) by the mixing weights that
minimise the error function E ( V ). The covariance matrix Λ V still needs to be
evaluated and is by (7.50) the inverse Hessian, as evaluated in Line 19. Due to
its dependence on G , the last Hessian in the IRLS iteration in Line 9 cannot be
used for that purpose, as G has changed thereafter.
To complete TrainMixWeights , let us consider how the Function Hessian
assembles the Hessian matrix H :itfirstcreatesanempty( KD V )
×
( KD V )
matrix that is thereafter filled by its block elements H kj = H jk ,asgivenby
(7.49). Here, the equality
×
φ ( x n ) g k ( x n ) g j ( x n ) φ ( x n ) T = Φ T ( Φ
( g k
g j ))
(8.3)
n
is used for the off-diagonal blocks of H where I kj = 0 in (7.49), and a similar
relation is used to get the diagonal blocks of H .
 
Search WWH ::




Custom Search