Database Reference
In-Depth Information
“Sure, how's early afternoon tomorrow?”
“Perfect, I've got my running 1pm meeting with him tomorrow. Maybe the
Chianti he likes during lunch will defuse the inevitable explosion!”
9.7 SUMMARY
In this chapter, we have introduced correlation and regression analysis. Both of
these techniques deal with the relationship between a “dependent variable” or out-
put variable that we label “Y,” and an “independent variable” or input variable that
we label “X.”
The correlation, r, is a dimensionless quantity that ranges between −1 and 1, and
indicates the strength and direction of a linear relationship between the two variables;
the (hypothesis) test of its signiicance is also discussed. We also note that the coef-
icient of determination, r 2 , has a direct interpretation as the proportion of variability
in Y explained by X (in a linear relationship).
We consider example scatter diagrams (graphs of the X, Y points) and discuss
how they correspond with the respective values of r. We also demonstrate in both
Excel and SPSS how to obtain the correlation.
Regression analysis quantiies the linear relationship between Y and X, by pro-
viding a least-squares line from which we can input a value of X and obtain a pre-
dicted (best estimate) value of Y, using the line's corresponding slope and intercept.
We note how to perform a regression analysis in both Excel and SPSS, and discuss
various conidence intervals of interest, as well as hypothesis testing to decide if we
should conclude that there truly is a linear relationship between Y and X “beyond a
reasonable doubt.” In each case—correlation and regression—our illustrations use a
small data set that is easier for the reader to follow, and then we apply the technique
to the prototype real-world data from Behemoth.com.
9.8 ADDENDUM: A QUICK DISCUSSION OF SOME
ASSUMPTIONS IMPLICIT IN INTERPRETING
THE RESULTS
When we perform “statistical inference” (more or less, for us, conidence intervals,
and hypothesis testing) in a correlation or regression analysis, there are three theo-
retical assumptions we are technically making.
One assumption, called “normality,” says that if we hold X constant at any (and
every) value, and were to look at many values of Y at that X value, the Y values
would form a normal distribution.
A second assumption, called “constant variability” (or often by the ugly word,
“homoscedasticity,” which is said to mean “constant variability” in Greek [and
sometimes, it is spelled with the irst “c” being a “k”]), says that the normal curves
for each X have the same variability (which as we might recall from Chapter 1,
 
Search WWH ::




Custom Search