Information Technology Reference
In-Depth Information
where a is a normalization constant and h is a constant that adjusts the degree of
smoothing of the estimated pdf. The learning algorithm can be derived using the
maximum-likelihood estimation. In a probability context, it is usual to maximize
the log-likelihood of equation (
A.3
) with respect to the unknown matrix W:
dL
ð
W
Þ
dW
¼
d log
j
det W
j
p
ð
y
Þ
dW
¼
d log
j
det W
j
dW
ð
A
:
6
Þ
þ
d log p
ð
y
Þ
dW
:
where
d log
j
det W
j
dW
¼ð
W
T
Þ
1
ð
A
:
7
Þ
Imposing independence on y and using equation (
A.4
):
¼
X
M
¼
X
M
d log p
ð
y
Þ
dW
d log p
ð
y
m
Þ
dW
1
p
ð
y
m
Þ
dp
ð
y
m
Þ
dW
m
¼
1
m
¼
1
ð
A
:
8
Þ
¼
X
M
1
p
ð
y
m
Þ
dp
ð
y
m
Þ
dy
m
dy
m
dW
:
m
¼
1
where using equation (
A.5
):
!
1
h
;
m
¼
1...M
y
m
y
ð
n
0
Þ
m
¼
a
X
n
0
y
m
y
ð
n
0
Þ
dp
ð
y
m
Þ
dy
m
2
m
h
e
ð
A
:
9
Þ
2
Let us call w
m
the m-th row of W. Then y
m
¼
w
m
x
;
and
dy
m
dW
¼
M
m
ð
A
:
10
Þ
where M
m
ð
l
;
l
0
Þ¼
d
ð
l
m
Þ
x
l
0
Substituting equations (
A.5
), (
A.9
), and (
A.10
) in equation (
A.8
) we have:
¼
X
M
d log p
ð
y
Þ
dW
f
ð
y
m
Þ
M
m
ð
A
:
11
Þ
m
¼
1
where
2
4
3
5
y
m
y
ð
n
0
Þ
m
2
P
h
y
m
e
2
f
ð
y
m
Þ¼
1
h
2
n
0
y
m
ð
A
:
12
Þ
y
m
y
ð
n
0
Þ
m
P
2
h
e
2
n
0
Search WWH ::
Custom Search