Information Technology Reference
In-Depth Information
The bootstrap, proposed by [Efron 1993] is a technique that has been
extensively investigated in the context of statistical inference, especially for
hypothesis testing and confidence interval estimation. It does not require any
assumptions about the probability laws. When applied to regression, the boot-
strap is used to estimate the statistical characteristics of the difference between
the training error and the generalization error. The approach is ideally suited
to problems for which the number of observations is small. That is particularly
true for scientific computing, and for the simulation of complex systems. An-
alytical functions are created by regression or interpolation from a database,
which are used as replacements of software modules that are more computa-
tionally demanding.
In the previous chapter, we emphasized the importance of model valida-
tion (estimation of the modeling error, of confidence intervals, etc.) in the
general context of nonlinear modeling. In the type of applications mentioned
above (replacement of a complex computation code by regression on data gen-
erated by that code), the problem is exactly the same, except for the fact that
computer-generated-data does not have noise other than numerical roundoff
errors. This section describes an alternative to the approaches discussed in
the previous chapter.
3.6.1 Principle of the Bootstrap
We will illustrate the principle of the bootstrap by the example of the estima-
tion of the confidence interval for the expectation µ of a random variable. The
purpose of the example, taken from [Wonnacott 1990], is simply to demon-
strate clearly the principle of the bootstrap. In this example, the confidence
interval of the expectation of a random variable is derived accurately from the
average and variance computed on the sample (as described in Chap. 2). That
result stems from the central limit theorem, which states that the distribution
of the average of a sample converges quickly towards a normal law.
Let us take a sample of the random variable of n = 10 observations:
x =
{
16, 12, 14, 6, 43 , 7 , 0 , 54 , 25 , 13
}
. The average of the sample is
10
x i
10 =19 . 0 ,
X =
i =1
and its standard deviation is
10
( x i
19) 2
s =
=17 . 09 .
9
i =1
The 95% confidence interval of the expectation µ is
t . 025 s
2 . 26 17 . 09
n =19 . 0
10
µ = X
±
±
19
±
12
7 <µ< 31 .
 
Search WWH ::




Custom Search