Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

6.7 Theoretical Questions

6.7.1 The Probabilistic Framework

Learning from examples makes sense only if there is some regularity in the

data. Within the statistical formulation of training, it is generally assumed

that the patterns are

pairs drawn independently at random

from an unknown probability distribution p ( x ,y ). In particular, the probabil-

ity of the learning set L M

{

input-output

}

p ( L M )= M

p ( x k ,y k )= M

p ( x k ) P ( y k

x k ) .

k =1

The second term above corresponds to the following process: first the input

x k is drawn at random with probability density p ( x k ); given x k , the class y k

is selected with a conditional probability P ( y k

x k ). The case of deterministic

classes considered in this chapter is just a particular case of this formulation.

Remark. The “teacher-student” paradigm, suggested in Chap. 2 for regres-

sion testing, is frequently used in classification theory. It is usually assumed

that the components of the input patterns are either Gaussian variables:

√ 2 π exp

x i 2

p ( x i )=

−

a,a ]: p ( x i )=1 / 2 a .

Then, the classes of the input vectors x k are defined by a “teacher” network

of weights w ∗ . For example, if the teacher is a deterministic perceptron, one

has P ( y k

or uniformly distributed variables within some interval [

−

x k ). The aim of learning is to find weights w that

convey good generalization properties to the “student”. Besides the examples

of L M , the “student” is expected to classify correctly any pattern drawn at

random with the same probability p ( x ) as the training set.

x k )= Θ ( y k w ∗ ·

Because the training set L M is probabilistic, the student weights w depend

on the particular realization of L M . Therefore, w is a random variable. In this

paragraph we apply the method of Bayesian inference to the determination

of the probability distribution p ( w

L M ). This method is based on Bayes

theorem, introduced in Chap. 1, which can formally be written as follows:

p ( w | L M ) P B ( L M )= P ( L M | w ) p 0 ( w ) ,

where P B ( L M ) is defined below; p 0 ( w ) is the a priori probability of the clas-

sifier parameters (the weights in the case of neural networks) before learning,

and P ( L M |

w ), called evidence, is the probability of the training set L M when

the student has weights w . The a posteriori probability density function for

the student weights is

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home