Database Reference
In-Depth Information
array already stores the production of the unit error and its derivative, so it
only needs to be multiplied by the previous layer's activation value, which is
passed into the following implementation as the array o :
public double[] update(double[] o,double r) {
for(int i=0;i<v.length;i++) {
if(bW != null)
bW[i] += r*err[i];
double[] W = w[i];
for(int j=0;j<W.length;j++)
W[j] += r*err[i]*o[j];
}
return v;
}
The other value passed to the update method, r , is a “learning rate” similar
to the one used in the LogisticRegression example. This keeps the
weight adjustment from moving too quickly, which helps the stability of the
gradient descent method. Typical values for most networks are between 0.2
and 0.8, and it usually requires some trial and error to find the best rate.
Finally, before learning can begin there should be some initialization of
weights. By default, all of the weights in a network start with a value of
zero, but this can lead to the network being trapped in a local minimum and
unable to get to the “best” network that fits the data. Usually, it is best to
randomizetheweightsbeforetrainingasshowninthefollowingcode,which
is added to the Layer implementation:
public void initialize(Random rng) {
for ( int i=0;i<v.length;i++) {
for ( int j=0;j<w[i].length;j++)
w[i][j] = 2*rng.nextDouble() - 1;
bW[i] = 2*rng.nextDouble() - 1;
}
}
public void initialize() {
initialize( new Random());
}
Search WWH ::




Custom Search