Information Technology Reference
In-Depth Information
If one convolves the error PDFs with a Gaussian kernel, G h , something
similar to Example 3.2 does happen. The theoretical infinite maximum at
the origin is removed, due to kernel smoothing, and a potential hill along
w 2 =0emerges. Figure 3.20a shows the potential surface corresponding to
V R 2 ( G 1 . 5
f E ; w 1 ,w 2 ) for the same w 0 =0setting as in Fig. 3.18.
A potential hill is also evident when computing empirical potential with fat
estimation of the error samples. Figure 3.20b shows the empirical potential
surface, V R 2 ( f E ; w 1 ,w 2 ), based on fat estimation of f E for an experimental
run with 100 error samples and bandwidth h =1. Gradient ascent in this
surface provides a good estimation of the optimum hyperplane, converging
to a solution around w =[40] T , with practically zero gradient thereafter
along the w 2 =0crest.
0.2
0.25
^
^
R 2
R 2
0.19
0.2
0.18
0.17
0.15
0.16
0.15
0.1
w 2
w 2
10
10
10
10
5
5
0
0
w 1
w 1
0
0
−5
−5
−10
−10
−10
−10
(a)
(b)
Fig. 3.20 Surfaces of the information potential for the arctan perceptron: a)
V R 2 ( G 1 . 5 ⊗ f E ; w 1 ,w 2 ) ;b) V R 2 ( f E ; w 1 ,w 2 ) based on fat estimation of
f E for 100
error samples (kernel bandwidth h =1 ).
When applied to Gaussian data with equal class covariance, the arctan
perceptron with information potential risk (or, equivalently, with quadratic
Rényi EE risk) behaves adequately, in terms of attaining min P e solutions.
Table 3.3 shows one-experiment results obtained for the unit covariance case
we have been discussing, and with gradient ascent applied to the information
potential in 65 epochs. The data consisted of 2000 instances per class and
the kernel bandwidth was h =1; optimal PDF estimation would require
h =0 . 31, therefore fat estimation is indeed being used. The weight values
shown in Table 3.3 are normalized to w 1 =1. The error rates at the end of the
65 epochs, together with the Bayes optimal min P e values, are also shown.
The arctan perceptron is clearly producing solutions close to the optimal
ones.
Table 3.4 shows one-experiment results when the covariance is
Σ = 1 . 5
0 . 52
.
(3.69)
 
Search WWH ::




Custom Search