Information Technology Reference
In-Depth Information
(
x
∗
)=ln
P
1
1
q
df
X|−
1
dx
(
x
∗
)
2
p
2
f
X|
1
(
x
∗
)
P
1
d
2
H
S
dx
2
p
df
X|
1
dx
(
x
∗
)
−
−
.
−
2
P
1
(4.32)
This expression is quite intractable as it involves general class-conditional
PDFs. Important simplifications can be made by considering the case of mu-
tually symmetric class distributions [216], that is, where
qf
X|−
1
(
x
) has the
same shape of
pf
X|
1
(
x
).Wereadilyseethat
P
−
1
(
x
∗
)=
P
1
(
x
∗
) for this type
of two-class problems. In particular, the Gaussian two-class problem with
equal standard deviations and priors is an example of such type of distribu-
tions. For mutually symmetric class distributions one has
−
q
df
X|−
1
dx
p
df
X|
1
dx
(
x
∗
)=
(
x
∗
);
−
(4.33)
hence,
2
p
df
X|
1
dx
.
p
2
f
X|
1
(
x
∗
)
P
1
d
2
H
S
dx
2
P
1
(
x
∗
)=
(
x
∗
)ln
−
2
P
1
+
(4.34)
1
−
Let
Q
(
x
∗
)=
df
dx
P
+
f
2
P
ln
,
(4.35)
1
−
2
P
P
1
(
x
∗
).The
function
Q
(
x
∗
) plays a key role in the analysis of the error entropy critical
points as carefully discussed in [216]. In fact, for increasingly distant classes
(increasing
d
with
x
∗
→−∞
pf
X|
1
(
x
∗
) and
P
where for notation simplicity we took
f
≡
≡
)
Q
(
x
∗
) can be shown to be negative and thus
error entropy has a minimum at
x
∗
. On the other hand, if the classes get
closer (decreasing
d
with
x
∗
→
0)
Q
(
x
∗
) may change sign and thus
x
∗
turns
out to be a maximizer of error entropy. Such is the case when the mutually
symmetric class distributions are by themselves symmetric like in Gaussian-
classes problems.
Summarizing, for a large class of problems optimization of SEE performs
optimaly whether in the sense of minimization (MEE) for suciently sepa-
rated classes or maximization for close classes. Of course, it would also be
relevant to know when to choose between optimization strategy, that is, to
know what is the minimum-to-maximum turn-about distance between the
classes. Unfortunately, the answer has to be searched case by case for each
pair of mutually symmetric distributions, taking into account that for a scaled
version of
X
,
Y
=
ΔX
,the ratio
x
∗
/Δ
between the
Q
(
x
∗
)=0solution and
the scale
Δ
, is a constant. In fact one has