Database Reference
In-Depth Information
SIDEBAR: THE COEFFICIENT OF DETERMINATION (R-SQUARED)
Since this is the irst example in which we found r for an actual data set (albeit, a small one!), we
now introduce another interesting idea, one that helps interpret the actual value of r.
If we compute r 2 , we get (0.895*0.895) = 0.801. The quantity, r 2 , is called the “coeficient of
determination,” even though, often, it's referred to as simply the “r 2 .”
We can give a very useful interpretation to the 0.801. Based on the data, we estimate that 80.1%
of the variability in Y (i.e., the degree to which all the Y values are not the same) can be explained
by the variability in X (i.e., the degree to which all the X values are not the same). In loose terms,
we might say that X is estimated to explain about 80.1% of Y, and if X were held constant, Y would
vary only 19.9% as much as it varies now.
In our example, Y = assessment of how sophisticated a speciic design is, and X = amount of expe-
rience buying products online. So, in that context, an r of 0.895, and r 2 of 0.801, we would say that
we estimate that about 80% of the variability in the respondents' opinions about how sophisticated the
design is can be explained by how much experience a respondent has had buying online products. By
the way, in this type of context, 80% would nearly always be considered a pretty high value!
We now pull down “Analyze” (we noted earlier in the topic that “Analyze” is
always how we begin a statistical analysis of any kind), highlight “Correlate,” and go
to the submenu item, “Bivariate.” See arrows in Figure 9.6 .
The term “bivariate” means correlation between two variables. The other choices
are more complex, and are beyond the scope of this chapter.
After we click, we get the “Bivariate Correlations” dialog box, as shown in
Figure 9.7 . The word, “Correlations,” is plural, since if your data set had, for example,
three variables/columns (say, Y, X1, X2), the output would give you the correlation
between each of the three pairs of variables; (Y, X1), (Y, X2), (X1, X2). Here, with
only two variables, we will get, of course, only one correlation value (not counting
the “1's”—the correlation of a variable with itself).
In Figure 9.7 , we need to drag the Y and X over to the right-side box called “Vari-
ables.” There is no need to change the defaults, including the check next to “Pearson”
(see sidebar).
SIDEBAR: KARL PEARSON
What we are inding is, technically, the Pearson correlation coeficient, named after the mathemati-
cian and biometrician (i.e., bio-statistician), Karl Pearson (1857-1936). He was born Carl Pearson,
but changed his name purposely and oficially to Karl, since he was a fervent fan of Karl Marx.
In 1911 he founded the world's irst university statistics department at University College,
London. In addition to his work on the correlation between variables, Dr. Pearson also headed up
the work on the chi-square test we worked with in an earlier chapter. We noted that it was called the
“Pearson chi-square test.”
Another claim to fame, although he didn't know it at the time, was that when the 23-year-old
( http://en.wikipedia.org/wiki/Albert_Einstein ) Albert Einstein started a study group, the Olympia
Academy, he suggested that the irst book to be read was Karl Pearson's The Grammar of Science .
Dr Pearson had two daughters and a son. His son, Egon Pearson, became a prominent statisti-
cian in his own right, and succeeded his father as head of the Applied Statistics Department at
University College. Egon Pearson, along with Jerzy Neyman, another prominent statistician, devel-
oped the basics of hypothesis testing as we know it today (improving ideas that had been earlier
considered by Karl Pearson).
 
Search WWH ::




Custom Search