Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Remark. It is possible to encode the two classes using z

∈{

0 , 1

}

.Codes

{−

1 , +1

}

and

{

0 , 1

}

are formally equivalent. They are related by the transfor-

mation y =2 z

1 encoding adopted in this Chapter is elegant and

presents advantages in programming. However, in electronic implementations,

it may be useful to use the

−

1. The

±

{

0 , 1

}

code.

The output σ ( x ; w ) of the classifier, neural network or any other classifier

depends on its input x and on its parameters, hereafter denoted by w even

if the classifier is not a neural network. The output corresponding to input

x k ∈

will be denoted by σ k ( x ; w ) or simply σ k ( σ k ∈{−

). The

classifier is able to classify correctly the example x k if σ k = y k , i.e., if the

following condition of correct classification is obeyed:

L M

1 , +1

}

σ k y k > 0 .

Otherwise, σ k

= y k ,sothat σ k y k < 0.

6.1.1 Training and Generalization Errors

The quality of training may be assessed through the training (or learning)

error ε t ( w ), which is the fraction of misclassified examples of L M .Fromthe

condition of correct classification, we have,

M

1

M

y k σ k ( x ; w )) ,

ε t ( w )=

Θ (

−

k =1

where Θ ( u ) is the Heaviside function, which takes on the value 1 if its argu-

ment is positive or zero, and 0 otherwise,

Θ ( u )= 1 f u

0

0 f u< 0 .

≥

In fact, the goal of learning a classification task using the examples in L M

is mainly to determine the classifier parameters that will correctly classify

new inputs, that is, generalize . Obviously, the patterns to be classified are

unknown, but we will assume that they present the same regularities which are

as those used for training. Mathematically, we consider that the input vectors

x are realizations of a real-valued random vector X . Similarly, the output

y (that is the code given to the class of x ) is the realization of a discrete

random variable Y . We thus assume that there is an unknown probability

density p XY ( x ,y )

≡

p X ( x ) P Y ( y

|

x ) from which are drawn

•

the inputs and outputs of the training set,

•

the new inputs, whose class, given by P Y ( y

|

x ), is unknown.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home