Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

2 y k −

x k ) 2

V ( z k )= 1

tanh( w

·

2 1

x k ) 2

1

y k tanh( w

=

−

·

2 1

tanh( z k ) 2 .

1

=

−

The corresponding modification of the weights with the algorithm of gradient

descent has the form given previously with

tanh( z k )

cosh 2 ( z k )

c k ( t ) = µ

M

1

−

γ k )

= µ

M

1

−

tanh(

w

γ k ) .

cosh 2 (

w

The latter relation is similar to the relation of Minimerror. Here,

plays

the same role as β . The essential difference between both algorithms is that

β is a controllable parameter of Minimerror, while

w

cannot be controlled

when minimizing the Least Squares cost.

6.4.5 Example of Application: The Classification of Sonar Signals

The data of this application is available at the address http://www.ics.uci.

edu/mlearn/MLRepository.html [Blake 1998]. The problem is the discrimi-

nation of sonar signals generated by cylindrical mines from those generated

by rocks with the same shape. The benchmark has a training set of 208 pre-

processed signals, defined by N =60realvalues x i ∈

[0 , 1]( i =1 ,...,N ),

and their corresponding classes. The first 104 signals are traditionally used as

training examples, the last 104 ones are used for estimating the generaliza-

tion error. Despite the fact that this benchmark has been used to test many

learning algorithms with many different network architectures, we discovered

using Minimerror that not only were the training set and the test set linearly

separable, but that the complete set of 208 signals is also linearly separable

[Torres Moreno et al. 1998]. That result was subsequently confirmed by the

algorithm of Ho and Kashyap (see Chap. 2). The left part of Fig. 6.14 shows

the distances of the patterns to the separating hyperplane found with Min-

imerror, with a sign corresponding to the class assigned by the perceptron

trained with the first 104 patterns. The solution has a margin κ =0 , 1226:

none of the training examples lie at a distance to the hyperplane smaller than

κ . On the contrary, among the 104 test patterns, that hyperplane makes 23

classification errors. The right part of Fig. 6.14 shows the distances (with a

sign corresponding to the class assigned by the trained perceptron) after learn-

ing with all the database of 208 signals. In that case, the margin is smaller

( κ =0 , 0028). The histogram of the pattern stabilities with respect to that

hyperplane is shown on Fig. 6.15. We will see later that, if we assume that

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home