Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Remark 2. Even the examples that are correctly classified, i.e., with γ> 0,

contribute to the cost function. the closer they lie to the hyperplane, the larger

their contribution.

Remark 3. If β is small enough that βγ k

1 for all k , then all the examples

contribute with almost the same prefactor, like in Hebb's rule discussed before.

Moreover, in the limit β

0, the stabilities of all the examples are in the

region where the cost function is linear (in the neighborhood of γ = 0), and

the prefactor in the gradient of the cost function is the same for all examples.

→

Remark 4. For intermediate values of β , the examples with large stabilities

with respect to the virtual window width 1 /β ( β

1) do not contribute

significantly to training, since their prefactor in the gradient of cost func-

tion is exponentially small (in the limit β

1, one has 1 / (cosh 2 ( βγ )) <

> 5, the prefactor is of order 10 − 4 . Loosely

speaking, the algorithm uses for learning only the examples lying inside a vir-

tual window of width β|γ| on both sides of the hyperplane.

4 exp(

−

2 β

)). For example, if β

The above remarks are at the basis of the Minimerror algorithm. The

hyperparameter β , which increases throughout the iterations to optimize the

solution, allows one to obtain a linear separation with large margin if it exists,

or finds surfaces that are locally discriminant (with large margins) otherwise.

The weights are initialized using Hebb's rule, which corresponds to β =0.

The iterations start with β su ciently small for all the patterns to be inside

the virtual window. If

max

is the norm of the example of largest norm,

one can use β ini =10 − 2 /

max . Then, at each training step (iteration) the

weights are updated and β is increased by a small amount δβ . This procedure

is known in the literature as deterministic annealing , a concept close to that

of simulated annealing , used in optimization problems (see the Chap. 8 on

optimization).

A heuristic improvement consists in considering two different values of

β,β + for the examples with positive stability and β − for those with negative

stability. In order to keep a small number of parameters, the ratio β + /β −

does not change during training. Thus, the Minimerror algorithm has three

parameters: the learning rate µ , the annealing step δβ and the asymmetry

β ± ≡

β + /β − . It proceeds as follows:

Minimerror Algorithm

•

Parameter Settings

1. learning rate µ (suggested value: 10 − 2 ),

2. ratio β ± (suggested value: 6)

3. annealing step δβ + (suggested value: 10 − 2 )

•

Initialization

1. iteration counter: t =0

2. weights: w (0) (suggested initialization: apply Hebb's rule and then

normalize the weights to

= N +1)

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home