Information Technology Reference
In-Depth Information
Fig. 6.8. Behavior of the upper and lower bounds as a function of the number of
iterations, for a case with γ min
=0 . 5and || x max
|| =2
x k
used at that iteration,
. On the other hand, the successive learning steps
increase the norm of the weights. Therefore, the relative correction produced
by an example, whenever its presentation generates a weight update, decreases
during training. As a consequence the hyperplane undergoes successive re-
orientations of decreasing amplitude. If some examples are very close to the
hyperplane (small γ min
), the corrections must become small enough to reach
the necessary precision. This explains why the convergence time is inversely
proportional to γ min
.
Remark 2. Instead of considering the inputs x k of classes y k , it is equivalent
to consider input vectors x k ≡ y k x k , all of the same class y k =+1.If w
classifies correctly the set of x k , it will do so with the x k , since the sign of the
aligned fields are unaltered by that transformation: y k w
x k
x k >
0. The computation time of training algorithms may be shortened if that
transformation is applied to the training set.
y k w
·
·
6.4.3 Training by Minimization of a Cost Function
Most training algorithms compute w through the minimization of a differen-
tiable cost function, which is the sum of partial costs per example. We have
seen that, for correct classification, the examples should have positive aligned
fields. It is therefore reasonable to consider partial costs that are functions of
z k : V ( z k ). The cost function to be minimized is
M
V z k .
1
M
C ( w )=
k =1
It depends on w through the aligned fields of the examples. We will see later
that the assumption of an additive cost over the examples is consistent with
the hypothesis that the examples are independent random variables.
Search WWH ::




Custom Search