Information Technology Reference
In-Depth Information
the network generalization capability. Therefore, care should be taken in selecting
the decay constant Ȝ , because an inappropriate value can deteriorate the
generalization capability of the weight decay process. As a remedy, Weigend et al.
(1991) recommend updating the Ȝ value on-line during the network training in
iterative steps.
Adding the penalty function in the weight decay and optimizing the augmented
performance index corresponds to the regularization method in which the penalty
term is added to the cost function to act as a restriction to the subsequent
optimization problem. In approximation theory, the added term penalizes the
curvature of the original solution, seeking for a smoother solution of the
optimization problem.
The regularization method is generally used to solve ill-posed problems . In the
theory of learning, the problems of learning smooth mappings from examples are
mostly ill-posed problems. For their solution Tikhonov (1963) proposed
optimization of the cost function I extended by a term J , which also represents a
cost function. Thus, the resulting cost function to be optimized becomes
I res = I + ȜJ ,
where Ȝ represents the regularization parameter , which determines the degree of
regularization in the sense of balancing the degree of smoothness of the solution
and its closeness to the training data. The regularization helps in stabilizing the
solution of the ill-posed problem because the added term, representing the penalty
to the original optimization problem, smoothens the cost function (Morozov,
1984).
The regularization approach determines the so-called Tikhonov functional
n
2
2
I
()
f
(
yf
( )
x
O
f
,
¦
res
i
i
i
1
the first term of which represents the closeness to the data, and in the second term f
is the input-output function, P is a linear differential constraint operator, and 2
is
a norm on the function space to which Pf belongs. This operator also embodies the
a priori knowledge about the problem solution.
To solve the regularization problem we proceed with the minimization of
extended cost function I res , using the resulting partial derivatives with respect to f in
order to build the Euler-Lagrange equation
1
n
ˆ
PPf
()
x
¦
(
y
f
()) (
x
G
x
x
),
i
i
O
i
1
P build the differential operator
in which the operator P and its adjoint operator
ˆ
PP Therefore, the above Euler-Lagrange equation is a partial difference equation.
Its solution can, therefore, be expressed as the integral transformation of the right-
.
Search WWH ::




Custom Search