Information Technology Reference
In-Depth Information
normalized variables X i . The initial values of the parameters should be drawn
from a centered distribution, whose covariance is unknown. The parameter
related to the bias is equal to zero; the potential ν = i =1 w i x i of each
neuron is thus the sum of n -1 random variables that are the products of
independent random variables, with zero mean, having the same distribution.
It can be shown, from the elements of statistics provided at the beginning of
the chapter, that one has
var( V )=( n
1)var( W i )var( X i ) ,
with var( X i ) = 1 since the variables have been normalized prior to training.
Thus, if the desired variance of the potential is 1, the initial values of the
parameters must be drawn from a centered distribution of variance 1/( n− 1).
For instance, it may be convenient to choose a uniform distribution between
max / 3, hence w max = 3 / ( n
2
1).
The above discussion is valid for multilayer Perceptrons. For RBF or
wavelet networks, the initialization problem is more critical, because those
are localized functions; if they are initially located far from the domain of
interest, or if their extension (standard deviation or dilation) is not appro-
priate, training will generally fail. The result of the teacher-student problem,
described in the next section, depends critically on initialization for local-
ized functions. The following strategy, described in detail [Oussar et al. 2002],
should be implemented: a large library of RBFs or wavelets is created, and
a selection method, analogous to the input selection methods described in a
previous section, is applied. Training is subsequently applied to the wavelets
or RBF's that were thus selected.
w max and + w max : var( W i )= w
How to Test a Training Algorithm: The Teacher-Student Problem
The experience gained during years of teaching and research shows that it is
very easy to design a faulty training algorithm, or to write a faulty training
program, that nevertheless converges, sometimes very slowly, and produces a
model that is not completely ridiculous. Algorithmic or software errors may
pass unnoticed if care is not exercised. Therefore, it is important to test the
validity of an algorithm or of a program that one has written or downloaded
for free from the Web.
The following procedure, known as the teacher-student problem is con-
venient and simple to implement. A network is created (the teacher), whose
parameters are random. That network is used for generating a training set, by
using random inputs, and computing the corresponding outputs. That data
set is used for training a second network (the student), which has the same
number of inputs and of hidden neurons as the teacher network. If the train-
ing algorithm and the computer program are correct, the parameters of the
teacher network should be retrieved by the student within roundoff errors: the
mean square error is on the order of 10 30 , and each parameter of the stu-
dent should be equal to a parameter of the teacher network, within roundoff
Search WWH ::




Custom Search