Information Technology Reference
In-Depth Information
2
y
k
−
x
k
)
2
V
(
z
k
)=
1
tanh(
w
·
2
1
x
k
)
2
1
y
k
tanh(
w
=
−
·
2
1
tanh(
z
k
)
2
.
1
=
−
The corresponding modification of the weights with the algorithm of gradient
descent has the form given previously with
tanh(
z
k
)
cosh
2
(
z
k
)
c
k
(
t
)
=
µ
M
1
−
γ
k
)
=
µ
M
1
−
tanh(
w
γ
k
)
.
cosh
2
(
w
The latter relation is similar to the relation of Minimerror. Here,
plays
the same role as
β
. The essential difference between both algorithms is that
β
is a controllable parameter of Minimerror, while
w
w
cannot be controlled
when minimizing the Least Squares cost.
6.4.5 Example of Application: The Classification of Sonar Signals
The data of this application is available at the address http://www.ics.uci.
edu/mlearn/MLRepository.html [Blake 1998]. The problem is the discrimi-
nation of sonar signals generated by cylindrical mines from those generated
by rocks with the same shape. The benchmark has a training set of 208 pre-
processed signals, defined by
N
=60realvalues
x
i
∈
[0
,
1](
i
=1
,...,N
),
and their corresponding classes. The first 104 signals are traditionally used as
training examples, the last 104 ones are used for estimating the generaliza-
tion error. Despite the fact that this benchmark has been used to test many
learning algorithms with many different network architectures, we discovered
using Minimerror that not only were the training set and the test set linearly
separable, but that the complete set of 208 signals is also linearly separable
[Torres Moreno et al. 1998]. That result was subsequently confirmed by the
algorithm of Ho and Kashyap (see Chap. 2). The left part of Fig. 6.14 shows
the distances of the patterns to the separating hyperplane found with Min-
imerror, with a sign corresponding to the class assigned by the perceptron
trained with the first 104 patterns. The solution has a margin
κ
=0
,
1226:
none of the training examples lie at a distance to the hyperplane smaller than
κ
. On the contrary, among the 104 test patterns, that hyperplane makes 23
classification errors. The right part of Fig. 6.14 shows the distances (with a
sign corresponding to the class assigned by the trained perceptron) after learn-
ing with all the database of 208 signals. In that case, the margin is smaller
(
κ
=0
,
0028). The histogram of the pattern stabilities with respect to that
hyperplane is shown on Fig. 6.15. We will see later that, if we assume that
Search WWH ::
Custom Search