METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

empirical errors are close to their true values, just that the smallest empirical

error is close to the smallest generalization error.

ˆ ()

= E . It is natural to assume that as our sample size

N becomes larger, our sampling error F will approach zero. (We will return to

this assumption below.) Suppose we could find a function I( N ) to bound our

sampling error, such that |F| I( N ). Then we could guarantee that our choice of

model was approximately correct ; if we wanted to be sure that our prediction

errors were within F of the best possible, we would merely need to have N (F) =

I -1 (F) data-points.

It should not be surprising to learn that we cannot, generally, make ap-

proximately correct guarantees. As the eminent forensic statistician C. Chan

remarked, "Improbable events permit themselves the luxury of occurring" (27),

and one of these indulgences could make the discrepancy between

Recall that

L

R

[ ()]

L

R F

ˆ ()

L R and

E [ L (R)] very large. But if something like the law of large numbers holds, or the

ergodic theorem (§3.2), then for every choice of R,

ˆ

Pr(|

L

( )

R

E

[

L

( )] | > )

R F

l

0,

[3]

for every positive F. 5 We should be able to find some function E such that

ˆ

Pr(|

L

( )

R

E

[

L

( )] | > )

R F E FR

b

(

N

, ,

),

[4]

with lim N E( N ,F,R) = 0. Then, for any particular R, we could give probably ap-

proximately correct (28) guarantees, and say that, e.g., to have 95% confidence

that the true error is within 0.001 of the empirical error requires at least 144,000

samples (or whatever the precise numbers may be). If we can give probably ap-

proximately correct (PAC) guarantees on the performance of one machine, we

can give them for any finite collection of machines. But if we have infinitely

many possible machines, might not there always be some of them which are

misbehaving? Can we still give PAC guarantees when R is continuous?

The answer to this question depends on how flexible the set of machines

is—its capacity . We need to know how easy it is to find a R such that f ( X ,R) will

accommodate itself to any Y . This is measured by a quantity called the Vapnik-

Chervonenkis (VC) dimension (22). 6 If the VC dimension d of a class of ma-

chines is finite, one can make a PAC guarantee that applies to all machines in

the class simultaneously:

(

)

ˆ

Pr max |

L

( )

R

E

[

L

( )] |

R I

(

N d

,

, )

E E

b

,

p

[5]

R

where the function I( N , d ,E) expresses the rate of convergence. It depends on the

particular kind of loss function involved. For example, for binary classification,

if the loss function is the fraction of inputs misclassified,

Complex Systems Science in Biomedicine

Search WWH ::

Custom Search

Home