Information Technology Reference
In-Depth Information
AIC
Nk
ln(
V
2
)
2
K
,
where N is the number of training data, k is the number of output units of the
network, V is the maximum likelihood estimate of the mean square error for
training data and K is the number of model parameters.
The application principle of the AIC is that, if two models have the same mean
square error for a training data set, then the smaller sized model should be selected.
Alternatively, from a set of possible models, the model with the smallest value of
AIC is to be selected (Ishikawa and Moriyama, 1996; Anders and Korn, 1999).
This, however, requests a set of models to be built and their parameter estimated
before this application principle is used.
Unfortunately, direct application of the AIC to neural networks is rather
circumstantial. It is, however, facilitated when using the network information
criterion (NIC) of Stone (1977)
1
tr[
BA
1
]
NIC
ln
Lw
(
)
,
T
T
which is a generalization of the AIC. The first term in the above expression
represents the estimated maximum logarithmic likelihood. The matrices A and B
are defined as
AE
{
{
[ln]
[ln
L
t
BE
L
ln .
L
t
t
If the classes of models investigated include the true model, then it holds
asymptotically that A=B and
tr BA
[
1
]
tr I
[ ]
K
,
where K is, again, the number of model parameters. In this case the NIC takes the
form
1
K
NIC
ln
Lw
(
)
.
T
T
This is similar to the AIC, which in this transcription becomes
2
2
K
AIC
ln
Lw
(
)
.
T
T
Search WWH ::




Custom Search