Information Technology Reference
In-Depth Information
∂H
S
∂w
0
We remark that the proof has only analyzed
because the other deriva-
tives (and consequently the complete gradient
H
S
) are rather intricate.
Thus, equality of class error probabilities is just a necessary condition. The
following example illustrates this result.
Example 4.5.
Consider the perceptron implementing the family of lines
w
1
x
1
+
w
2
x
2
+
w
0
=0to discriminate between two bivariate Gaussian classes. First,
let
∇
20]and
Σ
1
=
Σ
−
1
=
I
.Theoptimalsolutionisgiven(asa
function of
p
) by the vertical line with equation
μ
±
1
=[
±
4
ln
1
.
1
−
p
x
1
=
p
40
ln
1
−
p
.
Additionally,
w
1
must be positive to give the correct class orientation. One
can then numerically determine that
w
1
The optimal set of parameters must satisfy
w
2
=0and
w
0
=
−
H
S
(
w
∗
)=
0
only if
p
=1
/
2,which
corresponds to the class setting with equal class error probabilities.
∇
If we now assume
p
=1
/
2 and
Σ
1
=[
2
01
], the optimal solution is
6+
32 + 2 ln(2)
.
x
1
=
−
The error probabilities are unequal,
P
−
1
≈
0
.
019 and
P
1
≈
0
.
029,and
6+
32 + 2 ln(2)))
H
S
(
w
1
,
0
,w
1
(
∇
−
=
0
.
(4.60)
∂H
S
∂w
1
∂H
S
∂w
2
∂H
S
∂w
0
More precisely,
>
0 at the possible optimal
solutions. Therefore, the optimal solution is not a critical point of the error
entropy.
<
0,
=0and
The above example indicates that it suces from now on to analyze the
case of bivariate Gaussian class distributions to get a picture of the discrete
MEE (SEE) behavior regarding the optimality issue. Recall from Sect. 3.3.1
that Gaussianity is preserved under linear transformations. Therefore, if the
classes have means
μ
t
and covariances
Σ
t
for
t
∈{−
1
,
1
}
, it is straightforward
to obtain
F
U|t
(0) =
Φ
.
w
T
μ
t
+
w
0
−
w
T
Σ
t
w
(4.61)
For equal priors one gets
1
Φ
;
Φ
.
w
T
μ
−
1
+
w
0
w
T
1
2
P
1
=
1
2
μ
1
+
w
0
w
T
Σ
−
1
w
w
T
Σ
1
w
P
−
1
=
−
−
−
(4.62)
Unfortunately these expressions imply a rather intricate entropy formula and
of the corresponding derivatives. Let us consider spherical distributions with
Σ
−
1
=
Σ
1
=
I
, to obtain a linear (optimal) solution and, in order to simplify