METHODS AND TECHNIQUES OF COMPLEX SYSTEMS SCIENCE: AN OVERVIEW - Complex Systems Science in Biomedicine

Biomedical Engineering Reference

In-Depth Information

¬ -

1

2

N

4

-

I

(,,)

Nd

E

=

d

1 ln

+

) ln .

+

®

[6]

-

- -

d

E

N

Notice that R is not an argument to I, and does not appear in [6]. The rate of

convergence is the same across all machines; this kind of result is thus called a

uniform law of large numbers . The really remarkable thing about [5] is that it

holds no matter what the sampling distribution is, so long as samples are inde-

pendent; it is a distribution-free result.

The VC bounds lead to a very nice learning scheme: simply apply empirical

risk minimization, for a fixed class of machines, and then give a PAC guarantee

that the one picked is, with high reliability, very close to the actual optimal

machine. The VC bounds also lead an appealing penalization scheme, where the

penalty is equal to our bound on the over-fitting, I. Specifically, we set the term

M d (R) in [1] equal to the I in [5], ensuring, with high probability, that the F and

M d (R) terms in [2] cancel each other. This is structural risk minimization

(SRM).

It's important to realize that the VC dimension is not the same as the num-

ber of parameters. For some classes of functions, it is much lower than the num-

ber of parameters, and for others it's much higher . (There are examples of one-

parameter classes of functions with infinite VC dimension.) Determining the VC

dimension often involves subtle combinatorial arguments, but many results are

now available in the literature, and more are appearing all the time. There are

even schemes for experimentally estimating the VC dimension (29).

Two caveats are in order. First, because the VC bounds are distribution-

free, they are really about the rate of convergence under the worst possible dis-

tribution, the one a malicious adversary out to foil our data mining would

choose. This means that in practice, convergence is often much faster than [5]

would indicate. Second, the usual proofs of the VC bounds all assume independ-

ent, identically distributed samples, though the relationship between X and Y can

involve arbitrarily complicated dependencies. 7 Recently, there has been much

progress in proving uniform laws of large numbers for dependent sequences of

samples, and structural risk minimization has been extended to what are called

"mixing" processes (30), in effect including an extra term in the I function ap-

pearing in [5] that discounts the number of observations by their degree of mu-

tual dependence.

2.2. Choice of Architecture

The basic idea of data mining is to fit a model to data with minimal assump-

tions about what the correct model should be, or how the variables in the

data are related. (This differs from such classical statistical questions as testing

Complex Systems Science in Biomedicine

Search WWH ::

Custom Search

Home