Database Reference
In-Depth Information
So, we now can formally conclude that the two variables are, indeed, linearly related.
SIDEBAR: EXCEL'S WEIRD LABEL FOR THE P -VALUE FOR THE
F-STATISTIC
We want to add that the middle section of the output, the ANOVA table (you saw ANOVA tables
in several earlier chapters), gives you a p -value also, relative to the F-statistic. You can see the
F-statistic value of 12.097 (see curved arrow in Figure 9.19 ); its p -value is just to the right of it and
equals 0.040. But, wait a moment!!!! This value is exactly the same as the p -value for the slope!!
For reasons unknown to the authors, Excel calls the p -value for the F-statistic “Signiicance F,”
but we assure you that this is the p -value (and should be called p -value!!). Any time we are running
a simple regression (recall: this means there is only one X variable), the F-statistic will have the
same p -value as the p -value for the slope (t-test) , and provide exactly the same information content.
In fact, in writing up a report on the results of a simple regression, you would not want to separately
discuss the two p -values, since it would be a redundancy. In the next chapter, Chapter 10, the p -value
for the F-statistic and that for the slope will have different values and will mean different things.
There is one inal thing that we wish to impart about the output in Figure 9.19 ,
and that is the “Standard Error,” as listed in row 7 in the top section of the output
(see dashed horizontal arrow in Figure 9.19 ). Its value equals 0.587, and its notation
is often: Sy.x. This is a key value for inding a conidence interval for a prediction,
often a very important thing to ind. In essence, this is the standard deviation estimate
of the error of a prediction if we had the correct regression line. However, we do not
have the exact correct regression line (inding which, in theory, would require ininite
data!!). However, if the sample size is reasonably large (say, at least 25), and we are
predicting for a value of X that is near the mean of our data, we can, as an approxima-
tion, use the standard error value as if it were the overall standard deviation of the pre-
diction. With this caveat, the formula for a 95% conidence interval for a prediction is
Yc±TINV(0 . 05 , n −2) *Sy.x,
where “ n ” is the sample size (in this example, n = 5) and TINV is an Excel command
that provides a value from the t-distribution. The irst value (i.e., 0.05) relects wanting
95% conidence—it would be 0.01 for 99% conidence, 0.10 for 90% conidence, etc.;
the second value, ( n −2), is a degrees-of-freedom number—you really don't need to
know the details/derivation of why that value is what it is—it is easy to determine, since
you know the value of n , the sample size, and hence, you obviously know the value of
( n −2). For our earlier example, where we predicted a value of Yc to be 3.599, a 95%
conidence interval for what the value will actually come out for an individual person is:
3 . 599±TINV(0 . 05 , 3) * (0 . 587)
3 . 599± (3 . 182) * (0 . 587)
3 . 599±1 . 865
or
(1 . 734to5),
with the realization that we cannot get a value that exceeds 5.
 
Search WWH ::




Custom Search