Information Technology Reference
In-Depth Information
an example of a Dirac- δ comb for the error PDF, we could have Y k = b , i.e.,
assigning all input instances to ω k ;inthiscase, f E ( e )= 2 δ ( e
b ))+ 2 δ ( e ).
A single Dirac- δ for the error PDF can only be obtained, and at the origin,
when all errors are zero. This is an ideal min P e =0situation, demanding a
classifier function family suciently rich to allow such an error PDF, and in
case of iterative training requiring an algorithm which does indeed guarantee
the convergence of the error PDF towards the single Dirac- δ at the origin
( E =0).
( a
2.3.2 EE Risks and Information Theoretic Learning
The use of the entropy concept in data classification is not new. It has been,
namely, used as a measure of ecient dataset partition in clustering applica-
tions and as a node splitting criteria in decision trees, to be described at a later
section [237, 33]. It has also been proposed for classifier selection [237, 238].
The novelty lies here in its use as a risk functional, and the analysis of how
it performs both theoretically and experimentally, when applied to classifier
design.
The MEE approach to classifier design fits into the area of Information
Theoretic Learning (ITL), a research area enjoying a growing interest and
promising important advances in several applications. The introduction of
information theoretic measures in learning machines can be traced back at
least to [237, 238] and to [140]; this last author introduced the maximization
of mutual information between input and output of a neural network (the
infomax principle) as an unsupervised method that can be applied, for in-
stance, to feature extraction (see also [49]). A real blossom of ITL in the areas
of learning systems and of signal processing came in more recent years, when
J.C. Príncipe and co-workers built a large amount of theoretical and exper-
imental results of entropic criteria applied to both areas. In particular, they
proposed the minimization of Rényi's quadratic entropy of data errors for
solving regression problems [66], time series prediction [67], feature extrac-
tion [84,117], and blind source separation [99,65,67] using adaptive systems.
These and more recent advances on ITL issues and applications can be found
in the monograph [174].
The rationale behind the application of MEE to learning systems perform-
ing a regression task is as follows: the MEE approach implies a reduction of
the expected information contained in the error, leading to the maximiza-
tion of the mutual information between the desired target and the system
output [66, 67]. This means that the system is learning the target variable.
Information Theory also provides an interesting framework for analyzing
and interpreting risk functionals, as shown in the work of [37]. These authors
demonstrated the following
 
Search WWH ::




Custom Search