Database Reference
In-Depth Information
OK, here's where things really get interesting. If we have a value of X, we can
insert it into the equation for the line, and compute Yc, the value of Y that is predicted
for the value of X we input. For example, if X = 3, we predict that Y is
1 . 1+0 . 833(3) =3.599
But wait, there's more! Check out the correlation coeficient, which is 0.895 (see
solid horizontal arrow in Figure 9.19, 3 labeled “Multiple R”). This is a reasonably
high value (and, of course, is the same value we found when we did a correlation
analysis with these same data earlier in the chapter). Loosely, but pragmatically inter-
preted, it means we should expect, for the most part, the predicted value of Y and the
actual value of Y to be reasonably close to one another. If we examine the data set, we
see that the average of the (two) Y values when X = 3 is 3.5, which, indeed, is close
to the predicted value, Yc, of 3.599.
If we look right below “Multiple R,” we see “R Square,” which equals 0.801.
As earlier, this indicates that a bit over 80% of the variability in Y (i.e., how come
Y is not always the same!!) is due to the fact that X is not always the same. Indeed,
if X were always the same, the variability in Y would be only about 20% as much
as it is now.
In addition to the least-squares line and the correlation coeficient (and its square,
r 2 , the coeficient of determination), there are a few other noteworthy values in the
output of Figure 9.19 .
If you look at the bottom right of the output (see vertical arrow), you see a 95%
conidence interval for each of the coeficients (i.e., intercept and slope 4 ). Let's take
them one by one.
Our best estimate of the intercept is 1.1; however, a 95% conidence interval for the
true value of the intercept is −1.34 to 3.54. However, we can see that the intercept is
not signiicant, since its p -value is 0.24 (see the bent arrow in Figure 9.19 ). Therefore,
we cannot rule out that its true value equals zero. Quite often, however, the intercept
is not a quantity that, by itself, is of great value to us.
Now let's look at the conidence interval for the slope. Keep in mind that the
slope is crucially important; whether it's zero or not directly indicates whether the
variables are actually related. Here, we get a value for the slope of 0.833. The 95%
conidence interval of the true slope is 0.071 to 1.596. Its p -value (0.040) is below
the traditional 0.05 benchmark value. Therefore, at signiicance level equal to 0.05,
the slope is statistically signiicant.
3 The reader will note that the correlation is labeled “Multiple R.” This is simply relecting oversimpli-
ication (sloth?) on Excel's part. Excel did not want to bother writing simple R when there is only one
X, and multiple R when there is more than one X, and decided to just write multiple R no matter how
many X's there are. We obviously weren't involved in the usability testing. ☺
4 The reader may note that the conidence intervals for the intercept and for the slope are each written
twice! This, again, is simply relecting laziness on Excel's part. You can specify a conidence level
other than 95%, and if you do, Excel gives that conidence interval to you, but also, automatically, gives
you the conidence interval for 95%. If you do not specify another conidence level (and one virtually
never does so), Excel gives you the conidence interval for the 95% default and then gives you the
automatic one for 95%.
 
Search WWH ::




Custom Search