Modeling Methodology: Dimension Reduction and Resampling Methods - Neural Networks: Methodology and Applications

Information Technology Reference

In-Depth Information

The bootstrap, proposed by [Efron 1993] is a technique that has been

extensively investigated in the context of statistical inference, especially for

hypothesis testing and confidence interval estimation. It does not require any

assumptions about the probability laws. When applied to regression, the boot-

strap is used to estimate the statistical characteristics of the difference between

the training error and the generalization error. The approach is ideally suited

to problems for which the number of observations is small. That is particularly

true for scientific computing, and for the simulation of complex systems. An-

alytical functions are created by regression or interpolation from a database,

which are used as replacements of software modules that are more computa-

tionally demanding.

In the previous chapter, we emphasized the importance of model valida-

tion (estimation of the modeling error, of confidence intervals, etc.) in the

general context of nonlinear modeling. In the type of applications mentioned

above (replacement of a complex computation code by regression on data gen-

erated by that code), the problem is exactly the same, except for the fact that

computer-generated-data does not have noise other than numerical roundoff

errors. This section describes an alternative to the approaches discussed in

the previous chapter.

3.6.1 Principle of the Bootstrap

We will illustrate the principle of the bootstrap by the example of the estima-

tion of the confidence interval for the expectation µ of a random variable. The

purpose of the example, taken from [Wonnacott 1990], is simply to demon-

strate clearly the principle of the bootstrap. In this example, the confidence

interval of the expectation of a random variable is derived accurately from the

average and variance computed on the sample (as described in Chap. 2). That

result stems from the central limit theorem, which states that the distribution

of the average of a sample converges quickly towards a normal law.

Let us take a sample of the random variable of n = 10 observations:

x =

{

16, 12, 14, 6, 43 , 7 , 0 , 54 , 25 , 13

}

. The average of the sample is

10

x i

10 =19 . 0 ,

X =

i =1

and its standard deviation is

10

( x i −

19) 2

s =

=17 . 09 .

9

i =1

The 95% confidence interval of the expectation µ is

t . 025 s

2 . 26 17 . 09

√ n =19 . 0

√ 10 ≈

µ = X

±

19

±

12

⇒

7 <µ< 31 .

Neural Networks: Methodology and Applications

Search WWH ::

Custom Search

Home