Digital Signal Processing Reference
In-Depth Information
Here, we have used
1
W jW
jW W
1
2
( W ) 1
j ( W ) 1
P 1 W 1
¼
¼
:
jW 1
W 1
We define Dg ( W , W ) @ log j det Wj and write the first-order Taylor series expansion
given in (1.20) as
Dg ( W , W ) ¼hDW , r W log j det WjiþhDW , r W log j det Wji
which, upon comparison with (1.52) gives us the required result for the matrix gradient
@ log j det W j
@W ¼W H
:
(1 : 53)
We can then write the relative (natural) gradient updates to maximize the likelihood
function using Eqs. (1.34), (1.49) and (1.53) as
DW¼ ( W H
c ( u ) x H ) W H ( Ic ( u ) u H ) W:
(1 : 54)
The update given above and the score function c ( u ) defined in (1.50) coincide with the
one derived in [20] using a C 7! R
2 n isomorphic mapping in a relative gradient update
framework and the one given in [34] considering separate derivatives.
The update equation given in (1.54) can be also derived without explicit use of the
relative gradient update rule given in (1.34). We can use (1.49), (1.53), and
@u ¼ ( @W ) x , to write the first-order differential of the likelihood term ' t ( W )as
@' t ¼ Trace( @WW 1 ) þ Trace( @W W ) c H ( u ) @uc T ( u ) @u :
(1 : 55)
Defining @Z W ( @W ) W 1 , we obtain @u ¼ ( @W ) x ¼ @W ( W 1 ) u ¼ ( @Z ) u , @u ¼
( @Z ) u : By treating W as a constant matrix, the differential matrix @Z has components
@z ij that are linear combinations of @w ij and is a non-integrable differential form.
However, this transformation allows us to easily write (1.55) as
@' t ¼ Trace( @Z ) þ Trace( @Z ) c H ( u )( @Z ) uc T ( u )( @Z ) u
(1 : 56)
where we have treated Z and Z as two independent variables using Wirtinger
calculus. Therefore, the gradient update rule for Z is given by
DZ ¼ @' t
@Z ¼ ( Iu c T ( u )) T
¼ Ic ( u ) u H
(1 : 57)
which is equivalent to (1.54) since @Z ¼ ( @W ) W 2 1 .
 
Search WWH ::




Custom Search