Information Technology Reference
In-Depth Information
Where w ij is the weight of connection from unit i in the previous layer to
unit j , O i is the output value of unit i from the previous layer and θ j is the
bias of unit j .
The output value of hidden unit or output unit O j is computed by applying
activation function to its input value (weighted sum). Suppose activation
function is sigmoid function. We have:
1
O
= 1
j
I
+
e
j
-
Propagating the error backward : The error is propagated backward by
updating the weights and biases to reflect the error of network's prediction.
Given unit j , if unit j is output unit then its error is computed as below:
Err j = O j ( 1 - O j )( T j - O j )
If unit j is hidden unit, the weighted sum of the errors of the units connected
to it in the next higher layer is considered when its error is computed. So the
error of hidden unit is computed as below:
Err
=
O
(
O
)
Err
w
j
j
j
k
jk
k
Where w jk is the weight of the connection from unit j to a unit k in the
next higher layer and Err k is the error of unit k .
-
Updating the weights and biases is based on the errors. The weights are
updated so as to minimize the errors. Given Δ w ij is the change in weight w ij ,
the weight w ij is updated as below:
Δ w ij = ( l )Err j O i
w ij = w ij + Δ w ij
Where l is learning rate ranging from 0 to 1. Learning rate helps to avoid
getting stuck at a local minimum in decision space and helps to approach
to a global minimum.
The bias θ j of unit j is updated as below:
Δ θ j = ( l )Err j
θ j = θ j + Δ θ j
-
Terminating condition : There are following terminating conditions:
.
All Δ w ij in some iteration are smaller than given threshold
.
Iterating through all possible training tuples.
4.3 Applying Neural Network into Document Classification
Given a set of classes C = { computer science, math }, a set of terms T =
{ computer, programming language, algorithm, derivative }. Suppose all input
variables (units) are binary or Boolean, every document is represented as a set of
input variables. Each term is mapped to an input variable in which value 1
indicates the existence of this term in document and otherwise value 0 indicates
the lack of this term in document. So the input layer consists of four input units:
computer”, “programming language”, “algorithm” and “derivative” .
Search WWH ::




Custom Search