Discrimination - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

Fig. 6.13. Left : partial cost of Minimerror; right : prefactor of the weights correc-

tions, for two values of the hyperparameter β

through the use of a hyperparameter β . This cost is not a function of the

aligned field z , but of the stability γ :

V β ( γ )= 1

2 [1

−

tanh( βγ )] .

The contribution of each example to learning is proportional to

∂V β ( γ )

∂ w ∝

b

2cosh 2 ( βγ ) [ y x

−

γ w ] .

The partial cost, as well as the prefactor cosh − 2 ( βγ ), are shown on Fig. 6.13

as a function of γ , for two different values of β . The hyperparameter β has a

simple intuitive meaning: the larger β , the narrower the region on both sides

of the hyperplane where the examples contribute significantly to learning. The

examples that contribute effectively to the learning process are those within

a virtual window of width proportional to 1 /β centered on the hyperplane.

Due to the factor cosh − 2 ( βγ ), the contribution of the examples outside this

window is vanishingly small.

Remark 1. With reference to the algorithms with partial costs depending on

the aligned field, the derivative of V β ( γ ) with respect to the weights exhibits

an extra term that is proportional to γ w . That quantity, subtracted from

the term between square brackets in the gradient of the cost function, is the

component of y x parallel to w . It only contributes to changing the norm of the

weight vector, without modifying its orientation. If the weights are normalized

at each iteration, that term can be neglected.

Search WWH ::

Custom Search

Home