Information Technology Reference
In-Depth Information
The parameters of the connections between the variables of the model
and the inputs of the hidden neurons control the slope of the sigmoids of
hidden neurons.
The parameters of the connections between the constant input (bias) and
the inputs of the hidden neurons generate a horizontal shift of the sigmoids
of hidden neurons.
The parameters of the connections between hidden neurons and the inputs
of the output neurons control the influence of each hidden neuron on the
outputs.
The parameters of the connections between the bias and the output neu-
rons generate a vertical shift of the output of the network.
Therefore, it is natural to use different hyperparameters for those different
types of parameters [McKay 1992]. Then the cost function becomes
J = J + α 0
2
ω∈W 0
2
ω∈W 1
2
ω∈W 2
+ α 1
+ α 2
w 2
i
w 2
i
w 2
i
,
where W 0 is the set of parameters between the bias and the hidden neurons,
where W 1 is the set of parameters between the inputs and the hidden neurons,
and W 2 is the set of parameters of the inputs of the output neuron (including
the bias of the output neuron). Therefore, the values of the three parameters
α 1 , α 2 , α 3 must be found. A principled statistical method was proposed in
[McKay 1992], but it relies on numerous assumptions and requires demanding
computations. In practice, the values of the hyperparameters are not very
critical; a heuristic approach, consisting in performing different trainings with
different hyperparameters, is frequently su cient.
We illustrate this discussion on an example of a real application, from
[Stricker 2000].
Example
The application is a filtering task, as outlined in Chap. 1. In a corpus of texts
(press releases of the Agence France Presse), the texts that are relevant to
a given topic should be selected automatically. It is essentially a two-class
problem: a press release is either relevant or irrelevant. A training set of 1,400
relevant press releases and 8,000 irrelevant ones is available. The performance
measure is a quantity F that is a function of the precision of the classifier
(the ratio of the number of documents that are really relevant to the number
of documents that are considered as relevant by the classifier) and its recall
(the ratio of the number of documents that are considered as relevant by the
classifier to the number of relevant documents present in the database). The
better the performance, the larger the value of F .
A linear classifier is used, i.e., a neural network with zero hidden neuron
and an output neuron with sigmoid activation function. Since there are no
hidden units, the number of parameters cannot be decreased without chang-
ing the data representation. Since it is not desired to change the latter (which
Search WWH ::




Custom Search