Information Technology Reference
In-Depth Information
2. For moderate h (say h
2 as in Fig. 5.5e), ψ ZED shows a sigmoidal-type
shape and, as in ψ MSE and ψ CE , larger errors contribute to larger weights.
Note, however, the contrast with ψ CE : for larger errors ψ CE “accelerates”
the weight value while ψ ZED “decelerates”.
3. For larger values of h , ψ ZED behaves like ψ MSE , as illustrated in Fig. 5.5f.
In fact, lim h→ + ψ ZED = ψ MSE .
Despite the disadvantage of R ZED over R MSE and R CE in having to set h ,
it is important to emphasize that we are not concerned in obtaining a good
estimate of f E (0) but only to force it to be as high as possible. This means
that we can set some moderately high value for h with the advantage of
adapting it, and thus controlling how ψ ZED behaves, for each classification
problem at hand.
Moreover, the second basic behavior above suggests that the “decelerated”
caractheristic of ψ ZED enables a reduced sensitivity of R ZED to outliers (the
sensitivity degree controlled by h ) when compared to the other alternative
risks. This is illustrated in the following example.
Example 5.5. Consider discriminating two classes with bivariate input data
x = x 1 x 2 ] T , with circular uniform distribution (see Example 3.8 in
Sect. 3.3.1) and the following parameters:
μ 1 =[0 0] T ,
μ 1 =[1 . 10 T ,r 1 = r 1 =1 .
(5.26)
By symmetry the theoretically optimal linear discriminant is orthogonal to
x 1 at the decision threshold d =
w 0 /w 1 =0 . 55 and with min P e =0 . 1684.
Suppose that a training set from the said distributions with n instances
per class was available, which for whatever reason was “contaminated” by
the addition to class ω 1 of n 0 instances, n 0
n , with uniform distribution
in ]1 , 1+ l ] along x 1 . Figure 5.6 shows an example of such dataset with
n = 200 instances per class and n 0 =10outliers uniformly distributed in
]1 , 1+ l ] with l =0 . 2 (solid circles extending beyond x 1 =1). Also shown is a
linear discriminant adjusted by an R ZED perceptron trained with h =1(fat
estimation of the error PDF) during 80 epochs with η =0 . 001.
In order to investigate the influence of the n 0 outliers in the determination
of the decision threshold d , we proceed as follows: we repeat n exp times the
experiment of randomly generating datasets with 2 n + n 0 instances ( n + n 0
instances for class ω 1 ,and n instances for class ω 1 )andtrain R ZED and
R MSE perceptrons always with the above settings (80 epochs, η =0 . 001,
h =1). We do this for several values of l , governing the spread of the outliers.
Figure 5.7 shows averages of d
std ( d ) in terms of l , obtained in n exp = 500
experiments, for datasets with n = 200 instances per class and two values of
n 0 : n 0 =10(Fig. 5.7a) and n 0 =20(Fig. 5.7b). The value l =1corresponds
to the no outlier case. The experimental results shown in Fig. 5.7 clearly
indicate that the average d for the R ZED perceptron (thick dashed line) is
±
 
Search WWH ::




Custom Search