Information Technology Reference
In-Depth Information
Now consider redoing the regression after dropping the columns of
X
that fail to achieve significance at level a. Here, 0 < a < 1 is fixed. Let
q
n
,a
be the number of remaining columns. Let
R
n
,a
be the square of the con-
ventional multiple correlation in this second regression, and let
F
n
,a
be the
F
statistic. These are to be computed by the standard formulas, that is,
without any adjustment for the preliminary screening.
To estimate
R
n,a
and
F
n
,a
, the following will be helpful. Let
Z
be stan-
dard normal and F(
z
) =
P
{|
Z
| >
z
}. Analytically,
2
1
2
Ê
Á
ˆ
˜
•
()
=
Ú
F
z
exp
-
u
2
du
.
p
z
Choose l so that F(l) = a. Thus, l is the cutoff for a two-tailed
z
test at
level a. Let
()
=
Ú
gz
Z
2
<
1.
{
}
zz
>
For 0
z
<•, integration by parts shows
2
1
2
Ê
Á
ˆ
˜
()
=
()
+
gz
F
z
z
exp
-
z
2
.
(4)
p
Clearly,
{
}
=
() (
F .
EZ
2
Z
>
z
gz
z
(5)
Then, as intuition demands,
2
1
2
Ê
Ë
ˆ
¯
{
}
=+
()
>
2
2
EZ
Z
>
z
1
p
exp
-
z
F
z
1
.
(6)
Let
Z
l
be
Z
conditional on |
Z
| > l. Put
z
= l in (5) and recall that F(l) =
a:
()
=
{
}
=
{}
>
g
la
E Z
2
Z
>
l
l
E Z
2
1
(7)
Using (6) and further integration by parts.
{
}
=+
()
var
ZZz
2
>
2
vz
,
(8)
where