Modeling Methodology: Dimension Reduction and Resampling Methods - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

3. Estimate the standard deviation from the standard error computed for all

replicas,

B

θ ∗ ( b )

θ ∗ (

b =1

·

)=

B

θ ∗ ( b )

) 2

B

θ ∗ (

−

·

b =1

σ 2

B

=

.

B

−

1

One of the theorems proved by Efron deals with the composition of the

bootstrap estimator. The estimate σ B converges to the standard deviation

σ F ( θ ∗ ) of the parameter θ estimated from the sample distribution,

lim

B→∞

σ B = σ F .

That algorithm applies to any estimator. For instance, consider the com-

putation of the largest eigenvalue for PCA. It is the largest eigenvalue of the

variance-covariance matrix X n×p = X T X of the observations. The bootstrap

consists in simulating the replicas X n×p obtained by n random selections of

the lines of matrix X n×p . The statistics (average and standard deviation) may

then be generated easily. That shows the power of the method and its ease of

use. However, that method was not widely used in the past, because of the

amount of computation required: 50 to 200 bootstrapped bases are enough to

estimate an average, but several thousand bootstrapped bases are required to

determine the confidence intervals.

3.6.3 The Generalization Error Estimated by the Bootstrap

In the previous chapter, we emphasized the importance of estimating the

generalization error, and we described the estimation by leave-one-out .

The bootstrap technique can also be used advantageously to estimate that

error. The principle is the same: it consists in simulating B “bootstrapped”

bases. Each base may contain the same example several times, because of

random selection with replacement.

Binomial Distribution of Bootstrapped Bases

For each random selection, all examples have the same probability p =1 /n ,

where n is the number of examples. The number of occurrences of an example

in a bootstrapped base therefore obeys a binomial distribution B ( n,p =1 /n ).

The probability of an example appearing k times is given by P ( k )= C n p k (1

−

p ) n−k [Saporta 1990].

The probability of an element not appearing in the bootstrapped base

is therefore P (0) = (1

1 /n ) n . For su ciently large values of n , one has

−

P (0) n→∞ = e − 1

≈

0 . 368. On the average, 37% of the examples will not be

used for training.

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home