METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

test set or validation set. We then use the training set to fit the machine, and

evaluate its performance on the test set. (This is an instance of resampling our

data, which is a useful trick in many contexts.) Because we've made sure the test

set is independent of the training set, we get an unbiased estimate of the out-of-

sample performance.

In cross-validation , we divide our data into random training and test sets

many different ways, fit a different machine for each training set, and compare

their performances on their test sets, taking the one with the best test-set per-

formance. This reintroduces some bias—it could happen by chance that one test

set reproduces the sampling quirks of its training set, favoring the model fit to

the latter. But cross-validation generally reduces over-fitting, compared to sim-

ply minimizing the empirical error; it makes more efficient use of the data,

though it cannot get rid of sampling noise altogether.

2.1.2.

Regularization or Penalization

I said that the problem of minimizing the error is ill-posed , meaning that

small changes in the errors can lead to big changes in the optimal parameters. A

standard approach to ill-posed problems in optimization theory is called regu-

larization . Rather than trying to minimize

ˆ ()

L R alone, we minimize

ˆ ()

L

RMR

+

d

(),

[1]

ˆ ()

where d (R) is a regularizing or penalty function. Remember that

L R = E [ L (R)]

+ F, where F is the sampling noise. If the penalty term is well-designed, then the

R which minimizes

E [ L (R)] + F + M d (R)

[2]

will be close to the R that minimizes E [ L (R

—it will cancel out the effects of

favorable fluctuations. As we acquire more and more data, F 0, so M, too, goes

to zero at an appropriate pace, and the penalized solution will converge on the

machine with the best possible generalization error.

How then should we design penalty functions? The more knobs and dials

there are on our machine, the more opportunities we have to get into mischief by

matching chance quirks in the data. If one machine has fifty knobs and another

fits the data just as well but has only a single knob, we should (the story goes)

chose the latter—because it's less flexible the fact that it does well is a good in-

dication that it will still do well in the future. There are thus many regularization

methods that add a penalty proportional to the number of knobs, or, more for-

mally, the number of parameters. These include the Akaike information crite-

rion or AIC (23) and the Bayesian information criterion or BIC (24,25). Other

)]

Complex Systems Science in Biomedicine

Search WWH ::

Custom Search

Home