Information Technology Reference
In-Depth Information
f
n
(x)
0.25
0.2
0.15
0.1
0.05
x
0
0
1
2
3
4
5
6
7
8
Fig. E.1 Parzen window PDF estimates for the data instances represented by the
black circles. The estimates are obtained with a Gaussian kernel with bandwidths:
h
=0
.
2
(dotted line),
h
=0
.
5
(dashed line), and
h
=1
(solid line).
to emphasize the fact that
f
n
(
x
) is a random variable with a distribution
dependent on the joint distribution of the i.i.d.
X
i
.
The Parzen window estimator can also be written as a convolution of the
Parzen window with the empirical distribution:
μ
n
(
x
)=
1
h
K
x
μ
n
(
y
)
dy ,
−
y
f
n
(
x
)=
K
h
⊗
(E.4)
h
where
μ
n
(
x
)=
i
=1
δ
(
x
−
x
i
) is a Dirac-
δ
comb representing the empirical
density, and
K
h
(
x
)=
h
K
h
.Wemayalsowrite
K
h
(
x
) as
K
(
x
;
h
);
K
(
x
)
is then
K
(
x
;1).Notethat
|
=
|
.
For kernels satisfying the above conditions the convolution operation yields
an estimate
f
n
(
x
) which is a smoothed version of
f
(
x
). The degree of smooth-
ing increases with
h
, as exemplified by Fig. E.1, showing three different esti-
mates of a PDF computed with a Gaussian kernel on a 40 instance dataset,
random and independently drawn from the chi-square distribution with three
degrees of freedom.
The Parzen window estimate
f
n
enjoys the following important properties:
1. For a dataset with sample mean
x
and sample variance
s
2
,if
K
is a sym-
metric function, the mean
μ
n
and the variance
σ
n
of
K
h
(
x
)
|
K
(
x
)
|
f
n
satisfy:
V
[
f
n
(
x
)] =
s
2
+
h
2
x
2
K
(
x
)
dx ,
σ
n
≡
μ
n
=
x
;
(E.5)