Database Reference
In-Depth Information
NOTE
The derivatives used in the Activation class implementations in the
code are not quite the derivatives of f(x) . In this case, the derivatives
of f(x) can be expressed in terms f(x) , in this case as 1-f(x)×f(x) .
To make the calculation of the derivative more efficient, the value of
f(x) is assigned to fx and then used in the “derivative” function in
place of x itself. This representation of the derivatives of activation
functions is widely used in the neural network literature.
Unfortunately, the fact that the derivative is stated in terms of the
original function is not widely discussed in the literature. As a result,
there are often errors in examples of backpropagation found online.
This version of the derivative is used because the feed-forward portion
of the neural network has already computed f(x) for each unit in the
network. Because the value must be stored to compute the error at each
layer, it is convenient to use the derivative stated in terms of f(x) .
Otherwise, the original value of x would also need to be stored.
When data is fed to the layer, it must compute all of the weighted sums and
apply the activation function. The implementation here is used for clarity
rather than performance; consequently, it is fairly inefficient, requiring
O(n × m) operations:
public double[] feed(double[] x) {
for(int i=0;i<v.length;i++) {
double[] W = w[i];
v[i] = bW != null ? bW[i] : 0.0;
for(int j=0;j<W.length;j++) v[i] += W[j]*x[j];
v[i] = fn.f(v[i]);
}
return v;
}
Because the bias input is always 1, the output value for each unit can be
initialized to its bias weight. Otherwise, it is simply set to zero. After the
values have been updated, the function returns the new state so that it may
be passed to the next level (or used as the output).
Search WWH ::




Custom Search