Cryptography Reference
In-Depth Information
b discrete bins of equal length. The pdf of each bin is estimated by the relative
frequency of occurrence of samples in the bin. Let X be a random variable with
N realizations. The b partitions are defined as: a i =[ o + ih, o +( i +1) h ]where
o the value of the origin, h is the width of the bins and i =0 ,...,b
1. Let k i
be the number the measurements of X that lie in the interval a i .Thepdf f i of
X can be approximated as
f i = k i
N .
As this method is nonparametric, its parameters are not easily determined. The
choice of the number of bins b or their width can be non-trivial. In any case,
the partitioning must be the same for both variables. Even if Histogram-based
Estimation (HE) is computationally ecient, its results contain more statistical
errors than other methods.
4.3
Kernel Estimator
Kernel Density Estimation (KDE) constructs a smooth estimate of the density by
centering kernel functions at data samples [19]. The kernels weight the distance
of each points in the sample to the reference point depending on the form of
the kernel function and according to a given bandwidth h .InKDE, h plays a
similar role as b in HE. In fact, the uniform kernel function forms an histogram.
Gaussian kernels are most commonly used and we use them as well in this study.
Let
be N realizations of the random variable X .Thepdfestimate
using a Gaussian kernel is given by:
{
x 1 ,...,n N }
exp
.
N
x i ) 2
2 h 2
f ( x )= 1
N
1
h 2 π
( x
i =1
This estimation method is quite costly in computational time. Kernel estimators
are considered to be very good for density estimation of one-dimensional data
however it is not always the case for mutual information estimation.
4.4
k
-Nearest Neighbor Estimator
Kraskov et al. [14] present a new estimator based on distances of k -Nearest
Neighbors (KNN) to estimate densities. The authors consider a bivariate sample
and, for each reference point, a distance length is computed so that k neighbors
are within this distance length noted ( i ) for a reference point i .Thenumberof
points with distance ( i ) / 2 gives the estimate of the joint density at the point
i . The distance is then projected into each variable subspace to estimate the
marginal density of each variable. The estimation of MI using KNN depends on
thechoiceof k . In [14] the authors explain that statistical errors increase when k
decreases. In practice, we should use k> 1, however if k is too large, systematic
errors can outweigh the decrease of statistical errors. KNN gives good results
with less statistical errors than previous methods but with a computationally
heavy algorithm [21].
 
Search WWH ::




Custom Search