Information Technology Reference
In-Depth Information
0.6
w
k
w
1
0.3
w
2
0
w
0
epochs
0
35
Fig. 3.16 Evolution of the weights in an experiment where the perceptron was
trained to solve the μ
1
=[10]
T
close-classes case, for bivariate Gaussian inputs.
3.3.2.2
Realistic Datasets
Comparison of theoretical and empirical MEE behaviors has to be restricted
to two-dimensional problems, in order to have viable graphical representa-
tions and the Nelder-Mead algorithm running in reasonable time. One may,
nonetheless, use dimensionally reduced real-world datasets. In what follows
we consider the plane of the first two principal components (denoted (
x
1
,x
2
))
of the original datasets when these have more than two features.
Since the true (
X
1
,X
2
) joint distributions are unknown, the true theo-
retical MEE solutions cannot be derived. One is still able, however, to derive
theoretical MEE solutions of very closely resembling problems, proceeding in
the following way: first, model the bivariate real-world PDFs by appropriate
distributions, such that they achieve the same covariance matrices and with
minimum
L
1
distance of the marginal PDFs; next, apply to these modeled
PDFs the procedure outlined in Sect. 3.1.1 (numerical simulation).
PDF modeling of the marginal class-conditional distributions is achieved
by first obtaining from the data the Parzen window estimates,
f
X|t
,using
the optimal
h
IMSE
bandwidth. Next, one proceeds to adjust adequate known
PDFs (namely, Gaussian, Gamma, and Weibull) by minimizing the
L
1
dis-
tance between
f
X|t
and its model. The
L
1
distance is preferable to other
distance metrics (namely,
L
2
) by reasons described in [53]. Finally, for nu-
merical computation of the theoretical MEE one generates a large number of
points with the modelled class-conditional distributions and with the same
estimated covariance matrix.
In the work [219] the datasets of Table 3.2 were analyzed. The datasets
WDBC (30 features), Thyroid (5 features), and Wine (13 features) are from
[13]; PB12, a dataset with 2 features, is from [110]. For the first three datasets
the first two principal components were computed. These new datasets were