Information Technology Reference
In-Depth Information
Akaike's Information Criterion
In the above tests, the performance of the models is estimated through the
mean square error on a set of examples. It may be desirable, for models that
have similar performances, to take into account the complexity of the model,
since the simplest models are generally preferable, as discussed in Chap. 1.
Akaike's criterion [Akaike 1973, 1974; Norton 1986] is an example of such
an approach. It consists in choosing the model for which the AIC (Akaike
Information Criterion) is smallest,
AIC = N log( MSE )+2( q +1) ,
where N is the number of examples, q is the number of variables of the model
(linear with respect to the parameters), and where MSE is the mean square
error on a data set. Thus, for a given performance as expressed by the mean
square error, the most parsimonious models are favored.
A large number of variants of that criterion are discussed in [McQuarrie
et al. 1998].
2.4.2.3 Variable Selection by the Probe Feature Method
The selection method that is described in the present section is intuitive,
e cient, and based on simple principles [Stoppiglia et al. 2003]. It proceeds
in two steps,
ranking of the variables in order of decreasing relevance to the output,
elimination of irrelevant variables.
We describe those two steps below.
Input Ranking through Gram-Schmidt Orthogonalization (Orthogonal
Forward Regression)
In order to select the inputs of a neural model, it is convenient to perform
input selection with a model that is linear with respect to its parameters (a
polynomial model for instance), and to use the inputs thus selected as inputs
of a neural network, because input selection is easier for a model that is linear
with respect to its parameters.
Assume that p candidate variables (called primary variables) x i ( i =1
to p ), are available, after discussions with the experts of the process to be
modeled. If a nonlinear model is deemed necessary, one may consider, for
instance, a polynomial model of degree 2; such a model is linear with respect
to its parameters, its inputs being
all combinations of 2 variables among the p candidate variables,
the p candidate variables,
a constant term.
Search WWH ::




Custom Search