Information Technology Reference
In-Depth Information
Fig. 6.13.
Left
: partial cost of Minimerror;
right
: prefactor of the weights correc-
tions, for two values of the hyperparameter
β
through the use of a hyperparameter
β
. This cost is not a function of the
aligned field
z
, but of the stability
γ
:
V
β
(
γ
)=
1
2
[1
−
tanh(
βγ
)]
.
The contribution of each example to learning is proportional to
∂V
β
(
γ
)
∂
w
∝
b
2cosh
2
(
βγ
)
[
y
x
−
−
γ
w
]
.
The partial cost, as well as the prefactor cosh
−
2
(
βγ
), are shown on Fig. 6.13
as a function of
γ
, for two different values of
β
. The hyperparameter
β
has a
simple intuitive meaning: the larger
β
, the narrower the region on both sides
of the hyperplane where the examples contribute significantly to learning. The
examples that contribute effectively to the learning process are those within
a
virtual window
of width proportional to 1
/β
centered on the hyperplane.
Due to the factor cosh
−
2
(
βγ
), the contribution of the examples outside this
window is vanishingly small.
Remark 1.
With reference to the algorithms with partial costs depending on
the aligned field, the derivative of
V
β
(
γ
) with respect to the weights exhibits
an extra term that is proportional to
γ
w
. That quantity, subtracted from
the term between square brackets in the gradient of the cost function, is the
component of
y
x
parallel to
w
. It only contributes to changing the norm of the
weight vector, without modifying its orientation. If the weights are normalized
at each iteration, that term can be neglected.
Search WWH ::
Custom Search