Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression - Common Errors in Statistics

Information Technology Reference

In-Depth Information

1

B

*

Â

() =

()

ˆ

ˆ .

r

=

RMB

,

MSE

r

B

b

1

B

b

=

1

Using a component-of-variance calculation (Gong 1982), for Simulation

1.1

() =

(

)

12

M

0 1070

.

~

0 1078

.

=

M

100

;

so if we are interested in comparing root mean squared errors about

the excess error, we need not perform more than B = 100 bootstrap

replications.

In each simulation, I included 400 experiments and therefore used the

approximation

400

1

400

2

Â

2

() ∫-

ˆ

[

ˆ

]

[

ˆ

]

MSE 1

rErR

~

rR

e

-

,

e

=

11

r

where e and R e are the estimate and true excess of the e th experiment.

Figure 2 and 3 show 95% nonsimultaneous confidence intervals for

RMSE 1 's and RMSE 2 's. Shorter intervals for RMSE 1 's would be prefer-

able, but obtaining them would be time-consuming. Four hundred experi-

ments of simulation 1.1 with p = 4, n = 20, and B = 100 took 16

computer hours on the PDP-11/34 minicomputer, whereas 400 experi-

ments of simulation 2.3 with p = 6, n = 60, and B = 100 took 72 hours.

Halving the length of the confidence intervals in Figures 2 and 3 would

require four times the number of experiments and four times the com-

puter time. On the other hand, for each simulation in Figure 3, the confi-

dence interval for RMSE 2 ( ideal ) is disjoint from that of RMSE 2 ( boot ),

and both and disjoint from the confidence intervals for RMSE 2 ( jack ),

RMSE 2 ( cross ), and RMSE 2 ( app ). Thus, for RMSE 2 , we can convincingly

argue that the number of experiments is sufficient.

r

5. THE RELATIONSHIP BETWEEN CROSS-VALIDATION

AND THE JACKKNIFE

Efron (1982) conjectured that the cross-validation and jackknife estimates

of excess error are asymptotically close. Gong (1982) proved Efron's con-

jecture. Unfortunately, the regularity conditions stated there do not hold

for Gregory's rule. The conjecture seems to hold for Gregory's rule,

however, as evidenced in Figure 4, a scatterplot of the jackknife and cross-

validation estimates of the first 100 experiments of simulation 1.1. The

plot shows points hugging the 45° line, whereas a scatterplot of the boot-

strap and cross-validation exhibits no such behavior.

Common Errors in Statistics

Search WWH ::

Custom Search

Home