Information Technology Reference
In-Depth Information
where a is a normalization constant and h is a constant that adjusts the degree of
smoothing of the estimated pdf. The learning algorithm can be derived using the
maximum-likelihood estimation. In a probability context, it is usual to maximize
the log-likelihood of equation ( A.3 ) with respect to the unknown matrix W:
dL ð W Þ
dW
¼ d log j det W j p ð y Þ
dW
¼ d log j det W j
dW
ð A : 6 Þ
þ d log p ð y Þ
dW
:
where
d log j det W j
dW
¼ð W T Þ 1
ð A : 7 Þ
Imposing independence on y and using equation ( A.4 ):
¼ X
M
¼ X
M
d log p ð y Þ
dW
d log p ð y m Þ
dW
1
p ð y m Þ
dp ð y m Þ
dW
m ¼ 1
m ¼ 1
ð A : 8 Þ
¼ X
M
1
p ð y m Þ
dp ð y m Þ
dy m
dy m
dW :
m ¼ 1
where using equation ( A.5 ):
! 1
h ; m ¼ 1...M
y m y ð n 0 Þ
m
¼ a X
n 0
y m y ð n 0 Þ
dp ð y m Þ
dy m
2
m
h
e
ð A : 9 Þ
2
Let us call w m
the m-th row of W. Then y m ¼ w m x ; and
dy m
dW ¼ M m
ð A : 10 Þ
where M m ð l ; l 0 Þ¼ d ð l m Þ x l 0
Substituting equations ( A.5 ), ( A.9 ), and ( A.10 ) in equation ( A.8 ) we have:
¼ X
M
d log p ð y Þ
dW
f ð y m Þ M m
ð A : 11 Þ
m ¼ 1
where
2
4
3
5
y m y ð n 0 Þ
m
2
P
h
y m e
2
f ð y m Þ¼ 1
h 2
n 0
y m
ð A : 12 Þ
y m y ð n 0 Þ
m
P
2
h
e
2
n 0
Search WWH ::




Custom Search