Cryptography Reference
In-Depth Information
b
discrete bins of equal length. The pdf of each bin is estimated by the relative
frequency of occurrence of samples in the bin. Let
X
be a random variable with
N
realizations. The
b
partitions are defined as:
a
i
=[
o
+
ih, o
+(
i
+1)
h
]where
o
the value of the origin,
h
is the width of the bins and
i
=0
,...,b
1. Let
k
i
be the number the measurements of
X
that lie in the interval
a
i
.Thepdf
f
i
of
X
can be approximated as
−
f
i
=
k
i
N
.
As this method is nonparametric, its parameters are not easily determined. The
choice of the number of bins
b
or their width can be non-trivial. In any case,
the partitioning must be the same for both variables. Even if Histogram-based
Estimation (HE) is computationally ecient, its results contain more statistical
errors than other methods.
4.3
Kernel Estimator
Kernel Density Estimation (KDE) constructs a smooth estimate of the density by
centering kernel functions at data samples [19]. The kernels weight the distance
of each points in the sample to the reference point depending on the form of
the kernel function and according to a given bandwidth
h
.InKDE,
h
plays a
similar role as
b
in HE. In fact, the uniform kernel function forms an histogram.
Gaussian kernels are most commonly used and we use them as well in this study.
Let
be
N
realizations of the random variable
X
.Thepdfestimate
using a Gaussian kernel is given by:
{
x
1
,...,n
N
}
exp
.
N
x
i
)
2
2
h
2
f
(
x
)=
1
N
1
h
√
2
π
(
x
−
−
i
=1
This estimation method is quite costly in computational time. Kernel estimators
are considered to be very good for density estimation of one-dimensional data
however it is not always the case for mutual information estimation.
4.4
k
-Nearest Neighbor Estimator
Kraskov et al. [14] present a new estimator based on distances of
k
-Nearest
Neighbors (KNN) to estimate densities. The authors consider a bivariate sample
and, for each reference point, a distance length is computed so that
k
neighbors
are within this distance length noted
(
i
) for a reference point
i
.Thenumberof
points with distance
(
i
)
/
2 gives the estimate of the joint density at the point
i
. The distance is then projected into each variable subspace to estimate the
marginal density of each variable. The estimation of MI using KNN depends on
thechoiceof
k
. In [14] the authors explain that statistical errors increase when
k
decreases. In practice, we should use
k>
1, however if
k
is too large, systematic
errors can outweigh the decrease of statistical errors. KNN gives good results
with less statistical errors than previous methods but with a computationally
heavy algorithm [21].
Search WWH ::
Custom Search