Information Technology Reference
In-Depth Information
Only the 21 variables whose coefficients were significant at the 25
percent level were allowed to enter the equation on the second pass. The
results were as follows:
R 2 = 0.36, P = 5 ¥ 10 - 4
14 coefficients out of 15 were significant at the 25 percent level;
6 coefficients out of 15 were significant at the 5 percent level.
The results from the second pass are misleading indeed, for they appear
to demonstrate a definite relationship between Y and the X 's, that is,
between noise and noise. Graphical methods cannot help here; in effect, Y
and the selected X 's follow a jointly normal distribution conditioned on
having significant t statistics. The simulation was done 10 times; the
results are shown in Table 1. The 25 percent level was selected to repre-
sent an “exploratory” analysis; 5 percent for “confirmatory.” The simula-
tion was done in SAS on the UC Berkeley IBM 4341 by Mr. Thomas
Permutt, on April 16, 1982.
3. SOME ASYMPTOTICS
An asymptotic calculation is helpful to explain the results of the simulation
experiment. The Y and the X 's are independent; condition X to be con-
stant. There is no reason to treat the intercept separately since the Y 's and
X 's all have expectation zero. Finally, suppose X has orthonormal
columns. The resulting model is
Y =+
X be
(1)
where Y is an n ¥ 1 random vector, X is a constant n ¥ p matrix with
orthonormal columns, where p
p , while b is a p ¥ 1 vector of parame-
ters, and e is an n ¥ 1 vector of independent normals, having mean 0 and
common variance s 2 . In particular, the rank of X is p . All probabilities are
computed assuming the null hypothesis that b ∫ 0. Suppose
n
Æ•
and
p
Æ•
so that
p n
Æ
r
,
where
0
< <
r
1
.
(2)
Let R n be the square of the conventional multiple correlation coefficient,
and F n the conventional F statistic for testing the null hypothesis b ∫ 0.
Under these conditions, the next proposition shows that R n will be essen-
tially the ratio of the number p of variables to the number n of data
points: the proof is deferred.
Proposition. Assume (1) and (2). Then
R
2
Æ
r and
F
Æ
1
in probability.
(3)
n
n
Search WWH ::




Custom Search