Information Technology Reference
In-Depth Information
2. For moderate
h
(say
h
2 as in Fig. 5.5e),
ψ
ZED
shows a sigmoidal-type
shape and, as in
ψ
MSE
and
ψ
CE
, larger errors contribute to larger weights.
Note, however, the contrast with
ψ
CE
: for larger errors
ψ
CE
“accelerates”
the weight value while
ψ
ZED
“decelerates”.
3. For larger values of
h
,
ψ
ZED
behaves like
ψ
MSE
, as illustrated in Fig. 5.5f.
In fact, lim
h→
+
∞
ψ
ZED
=
ψ
MSE
.
Despite the disadvantage of
R
ZED
over
R
MSE
and
R
CE
in having to set
h
,
it is important to emphasize that we are not concerned in obtaining a good
estimate of
f
E
(0) but only to force it to be as high as possible. This means
that we can set some moderately high value for
h
with the advantage of
adapting it, and thus controlling how
ψ
ZED
behaves, for each classification
problem at hand.
Moreover, the second basic behavior above suggests that the “decelerated”
caractheristic of
ψ
ZED
enables a reduced sensitivity of
R
ZED
to outliers (the
sensitivity degree controlled by
h
) when compared to the other alternative
risks. This is illustrated in the following example.
≈
Example 5.5.
Consider discriminating two classes with bivariate input data
x
=
x
1
x
2
]
T
, with circular uniform distribution (see Example 3.8 in
Sect. 3.3.1) and the following parameters:
μ
−
1
=[0 0]
T
,
μ
1
=[1
.
10
T
,r
−
1
=
r
1
=1
.
(5.26)
By symmetry the theoretically optimal linear discriminant is orthogonal to
x
1
at the decision threshold
d
=
w
0
/w
1
=0
.
55 and with min
P
e
=0
.
1684.
Suppose that a training set from the said distributions with
n
instances
per class was available, which for whatever reason was “contaminated” by
the addition to class
ω
−
1
of
n
0
instances,
n
0
−
n
, with uniform distribution
in ]1
,
1+
l
] along
x
1
. Figure 5.6 shows an example of such dataset with
n
= 200 instances per class and
n
0
=10outliers uniformly distributed in
]1
,
1+
l
] with
l
=0
.
2 (solid circles extending beyond
x
1
=1). Also shown is a
linear discriminant adjusted by an
R
ZED
perceptron trained with
h
=1(fat
estimation of the error PDF) during 80 epochs with
η
=0
.
001.
In order to investigate the influence of the
n
0
outliers in the determination
of the decision threshold
d
, we proceed as follows: we repeat
n
exp
times the
experiment of randomly generating datasets with 2
n
+
n
0
instances (
n
+
n
0
instances for class
ω
−
1
,and
n
instances for class
ω
1
)andtrain
R
ZED
and
R
MSE
perceptrons always with the above settings (80 epochs,
η
=0
.
001,
h
=1). We do this for several values of
l
, governing the spread of the outliers.
Figure 5.7 shows averages of
d
std
(
d
) in terms of
l
, obtained in
n
exp
= 500
experiments, for datasets with
n
= 200 instances per class and two values of
n
0
:
n
0
=10(Fig. 5.7a) and
n
0
=20(Fig. 5.7b). The value
l
=1corresponds
to the no outlier case. The experimental results shown in Fig. 5.7 clearly
indicate that the average
d
for the
R
ZED
perceptron (thick dashed line) is
±