Information Technology Reference
In-Depth Information
f
fat
(
e
)=
μ
n
⊗
G
h
fat
(
e
)
,
with
h
fat
>h
IMSE
.
(3.26)
By Proposition 3.1, there is a Gaussian kernel
G
h
(
e
) such that
G
h
fat
(
e
)=
G
h
IMSE
(
e
)
⊗
G
h
(
e
). Hence,
f
fat
(
e
)=
μ
n
⊗
G
h
(
e
)=
f
n
(
e
)
G
h
IMSE
(
e
)
⊗
⊗
G
h
(
e
)
→
n→∞
f
(
e
)
⊗
G
h
(
e
)
,
(3.27)
where the convergence is in the IMSE sense.
The estimate
f
fat
(
e
) is oversmoothed compared to the one converging to
f
(
e
). This is unimportant since we are not really interested in
f
n
(
e
) (we
namely don't use it to compute error rates). Our sole interest is in getting
the right classifier parameter values (
d
and
σ
in Examples 3.1 and 3.2) cor-
responding to min
P
e
.
3.2 The Linear Discriminant
Linear discriminants are basic building blocks in data classification. The lin-
ear discriminant implements the following classifier family:
Z
W
=
θ
(
w
T
x
+
w
0
);
w
∈
d
,w
0
∈
R
,
(3.28)
where
w
and
w
0
are the classifier parameters usually known as
weight vec-
tor
and
bias
term, respectively, and
θ
(
·
) is the usual classifier thresholding
function yielding class codes. We restrict here our analysis of the linear dis-
criminant to the case where the inputs are Gaussian distributed; this will
be enough to demonstrate the MEE sub-optimal behavior for this type of
classifier.
W
⊂
R
3.2.1 Gaussian Inputs
To derive the error PDF for Gaussian inputs
x
i
we take into account that
Gaussianity is preserved under linear transformations: if
X
with realizations
x
=[
x
1
...x
d
]
T
has a multivariate Gaussian distribution with mean
μ
and
covariance
Σ
,X
∼
g
(
x
;
μ
,
Σ
),then
Y
=
w
T
X
+
w
0
∼
g
(
y
;
w
T
+
w
0
,
w
T
Σw
)
.
μ
(3.29)
Therefore, the class-conditional error PDFs,
f
E|t
(
e
), are also Gaussian and
we deal with an error PDF setting similar to the one of Examples 3.1 and
3.2:
f
Y |t
(
y
)=
g
(
y
;
w
T
μ
X|t
+
w
0
,
w
T
Σ
X|t
w
)
(3.30)