METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

should not be multiplied beyond necessity," or, "the model should be no rougher

than necessary." This takes complexity to be a property of an individual model,

and the hope is that a simple model that can predict the training data will also be

able to predict new data. Under many circumstances, one can prove that as the

size of a sample approaches infinity regularization will converge on the correct

model, the one with the best generalization performance (26). But one can often

prove exactly the same thing about ERM without any regularization or penaliza-

tion at all; this is what the VC bounds [5] accomplish. While regularization

methods often do well in practice, so, too, does straight ERM. If we compare the

performance of regularization methods to straight empirical error minimization

on artificial examples, where we can calculate the generalization performance

exactly, regularization sometimes conveys no clear advantage at all (45).

Contrast this with what happens in structural risk minimization. There our

complexity penalty depends solely on the VC dimension of the class of models

we're using. A simple, inflexible model which we find only because we're look-

ing at a complex, flexible class is penalized just as much as the most wiggly

member of that class. Experimentally, SRM does work better than simple ERM,

or than traditional penalization methods.

A simple example may help illuminate why this is so. Suppose we're inter-

ested in binary classification, and we find a machine R that correctly classifies a

million independent data points. If the real error rate (= generalization error) for

R was one in a hundred thousand, the chance that it would correctly classify a

million data points would be

6

(0.99999) 4.5 # 10 -5 . If R was the very first pa-

rameter setting we checked, we could be quite confident that its true error rate

was much less than 10 -5 , no matter how complicated the function f ( X ,R) looked.

But if we've looked at ten million parameter settings before finding R, then the

odds are quite good that, among the machines with an error rate of 10 -5 , we'd

find several that correctly classify all the points in the training set, so the fact

that R does is not good evidence that it's the best machine. 9 What matters is not

how much algebra is involved in making the predictions once we've chosen R,

but how many alternatives to R we've tried out and rejected. The VC dimension

lets us apply this kind of reasoning rigorously and without needing to know the

details of the process by which we generate and evaluate models.

The upshot is that the kind of complexity which matters for learning, and so

for Occam's Razor, is the complexity of classes of models , not of individual

models nor of the system being modeled. It is important to keep this point in

mind when we try to measure the complexity of systems (§8).

10

2.4. Relation of Complex Systems Science to Statistics

Complex systems scientists often regard the field of statistics as irrelevant

to understanding such systems. This is understandable, since the exposure most

Complex Systems Science in Biomedicine

Search WWH ::

Custom Search

Home