Beyond Aggregation - Real-Time Analytics

Database Reference

In-Depth Information

NOTE

The derivatives used in the Activation class implementations in the

code are not quite the derivatives of f(x) . In this case, the derivatives

of f(x) can be expressed in terms f(x) , in this case as 1-f(x)×f(x) .

To make the calculation of the derivative more efficient, the value of

f(x) is assigned to fx and then used in the “derivative” function in

place of x itself. This representation of the derivatives of activation

functions is widely used in the neural network literature.

Unfortunately, the fact that the derivative is stated in terms of the

original function is not widely discussed in the literature. As a result,

there are often errors in examples of backpropagation found online.

This version of the derivative is used because the feed-forward portion

of the neural network has already computed f(x) for each unit in the

network. Because the value must be stored to compute the error at each

layer, it is convenient to use the derivative stated in terms of f(x) .

Otherwise, the original value of x would also need to be stored.

When data is fed to the layer, it must compute all of the weighted sums and

apply the activation function. The implementation here is used for clarity

rather than performance; consequently, it is fairly inefficient, requiring

O(n × m) operations:

public double[] feed(double[] x) {

for(int i=0;i<v.length;i++) {

double[] W = w[i];

v[i] = bW != null ? bW[i] : 0.0;

for(int j=0;j<W.length;j++) v[i] += W[j]*x[j];

v[i] = fn.f(v[i]);

}

return v;

}

Because the bias input is always 1, the output value for each unit can be

initialized to its bias weight. Otherwise, it is simply set to zero. After the

values have been updated, the function returns the new state so that it may

be passed to the next level (or used as the output).

Search WWH ::

Custom Search

Home