Information Technology Reference
In-Depth Information
Mozer and Smolensky (1988) used ȡ as a measure of relevancy, defined as the
difference between the error after removing a unit and the error before removing a
unit. Karinin (1990), however, considers the error sensitivity with respect to
removal of individual connections and removes the low-sensitivity connections. Le
Cun et al. (1990), again, proposed the optimal brain damage procedure under the
condition that the Hessian matrix H is diagonal and estimated the saliency of the
weights and the second derivative of the error with respect to the weights. Hassibi
et al. (1992) removed the diagonallity restriction of the Hessian matrix and
considered the general case of an arbitrary form of Hessian matrix, which they
termed the optimal brain surgeon . Both approaches are based on consideration of
sensitivity of weights perturbation on the error function E using the Taylor series
w § ·
E
1
2
3
T
G
E
G
w
G
w H
G
w
G
w
,
¨ ¸
w © ¹
w
where
G
EEw w
(
G
)
and
w
¨ ¸
w © ¹
§ ·
2
E
H
2
w
is the corresponding Hessian matrix.
Now, knowing that for a network trained to the local minimum in error, the
partial derivative
w
E
w
0
w
holds. Neglecting all higher order terms in the corresponding Taylor series and
eliminating a specific weight, say
w
,
measures should be undertaken to minimize
ij
the increase in error
G taking into account the condition of weight elimination
,
as given by
G .
ww
0
ij
ij
The condition of weight elimination in vectorial form is given by
T
ij
eww
G ,
0
ij
 
Search WWH ::




Custom Search