Information Technology Reference
In-Depth Information
Fig. 6.13. Left : partial cost of Minimerror; right : prefactor of the weights correc-
tions, for two values of the hyperparameter β
through the use of a hyperparameter β . This cost is not a function of the
aligned field z , but of the stability γ :
V β ( γ )= 1
2 [1
tanh( βγ )] .
The contribution of each example to learning is proportional to
∂V β ( γ )
w
b
2cosh 2 ( βγ ) [ y x
γ w ] .
The partial cost, as well as the prefactor cosh 2 ( βγ ), are shown on Fig. 6.13
as a function of γ , for two different values of β . The hyperparameter β has a
simple intuitive meaning: the larger β , the narrower the region on both sides
of the hyperplane where the examples contribute significantly to learning. The
examples that contribute effectively to the learning process are those within
a virtual window of width proportional to 1 centered on the hyperplane.
Due to the factor cosh 2 ( βγ ), the contribution of the examples outside this
window is vanishingly small.
Remark 1. With reference to the algorithms with partial costs depending on
the aligned field, the derivative of V β ( γ ) with respect to the weights exhibits
an extra term that is proportional to γ w . That quantity, subtracted from
the term between square brackets in the gradient of the cost function, is the
component of y x parallel to w . It only contributes to changing the norm of the
weight vector, without modifying its orientation. If the weights are normalized
at each iteration, that term can be neglected.
Search WWH ::




Custom Search