Information Technology Reference
In-Depth Information
k N
dp i ( ε )
d
p k 1
i
1
N
k
1
P k ( ε )=
(
1
p i )
,
(19.5)
k
ε
where p i is the mass of the
ε
-ball. Then the expected value of log p i will be
E
(
log p i )= ψ (
k
) ψ (
N
) ,
(19.6)
where
ψ ( · )
is the digamma function :
) 1 d
Γ (
t
)
ψ (
)= Γ (
,
t
t
(19.7)
dt
where
Γ ( · )
is the gamma function. It holds when
ψ (
1
)=
C where C is the Euler -
Mascheroni constant ( C
0
.
57721). The mass of the
ε
-ball can be approximated if
the p
-ball is uniform ( epsilon is chosen sufficiently small so that
the uniform distribution is valid) by
.
d
.
f inside the
ε
d
x P
p i ( ε )
c d x ε
(
X
=
x i ) ,
(19.8)
where c d x is the number of points within the unit ball in the d x -dimensional space.
From Equation (19.8) we can find an estimator for P
(
=
)
X
x i
:
log
[
P
(
X
=
x i
)] ψ (
k
) ψ (
N
)
dE
(
log
ε (
i
))
log c d x .
(19.9)
Finally we obtain the entropy estimator for X [14]:
N
i = 1 logε ( i ) ,
d x
N
H
(
X
)= ψ (
N
) ψ (
k
)+
log c d x +
(19.10)
where
is twice the distance from x i to its k -th neighbor in the d x -dimensional
space. For the joint entropy we have
ε (
i
)
d x +
N
i = 1 log ( ε ( i )) .
d y
H
(
X
,
Y
)= ψ (
N
) ψ (
k
)+
log
(
c d x c d y )+
(19.11)
N
The I
is now ready to be estimated by Equation (19.2). The problem with this
estimation is that a fixed number k is used in all estimators but the distance metric
in different scaled spaces (marginal and joint) are not comparable . To avoid this
problem, instead of using a fixed k , n x (
(
X ; Y
)
i
)+
1 and n y (
i
)+
1 are used in obtaining
the distances (where n x (
i
)
and n y (
i
)
are the number of samples contained in the bin
) ε ( i )
2
)+ ε ( i )
2
) ε ( i )
2
)+ ε ( i )
2
[
x
(
i
,
x
(
i
]
and
[
y
(
i
,
y
(
i
]
, respectively) in the x - y scatter
diagram. Equation (19.10) becomes
N
i = 1 logε ( i ) .
d x
N
H
(
X
)= ψ (
N
) ψ (
n x (
i
)+
1
)+
log c d x +
(19.12)
Finally the Equation (19.2) is rewritten as
 
Search WWH ::




Custom Search