Geoscience Reference
In-Depth Information
Is there any way to resolve this conundrum? Indeed there is: We must accept that there is no g !
That is to say, there is no single pattern that characterizes overall intelligence across cultures (or, in
the temperature example, there is no single pattern of variation that characterizes the temperature
history at all locations). There are two patterns of near equal importance in the dataset, and their
relative prominence will depend on the precise convention that we use. The only fail-safe approach
is to recognize that both patterns are necessary to describe the data. We must not overstate the
importance of PC#1 alone. We must, in short, avoid the pitfall of reification.
Principal component analysis simply provides a convenient way of efficiently summarizing
information in a large dataset. Any appropriate application of PCA must retain enough PCs to
describe all significant patterns in that data. Retaining only a single leading pattern will not in general
achieve that goal. As Gould explains in The Mismeasure of Man , by retaining only PC#1 and
unjustifiably throwing out the rest of the variation in the data, Spearman, and Burt after him, discarded
significant information that conflicted with their hypothesis of a single, culturally universal, unique
measure of intelligence. Such data, Gould notes, include culturally specific factors (e.g., the relative
demands in a particular culture on individuals to develop, say, verbal as compared to arithmetic
skills) that belie the concept of a simple, objective, universal metric of innate intelligence. Indeed,
had they adopted just a slightly different convention, keeping only PC#1 would have led them to an
entirely different conclusion.
With Spearman and Burt, the arcane tool of PCA had been misapplied to putative metrics of
human intelligence to support theories of a racial basis for intelligence. With McIntyre (and colleague
McKitrick), it was—as we shall now see—misapplied to sets of tree ring records to support a
critique of climate change research. If there is a lesson in this curious confluence, it is that scientific
findings that rest on such technical complexities are prone to abuse by those with a potential ax to
grind. Inappropriate decisions made in the statistical analysis can have profound consequences for the
results. Given the complexities, it's easy enough to make mistakes. For those with an agenda, it is
even easier to overlook them or, worse, exploit them intentionally.
Hiding the Hockey Stick
While the specifics were of course different, McIntyre and McKitrick in their critique of our work
had in essence committed the same statistical error as had Spearman and Burt. The MBH98 set of
various proxy data, as noted in chapter 4 , was heavily weighted toward tree ring data. Had we not
taken appropriate precautions to deal with that issue, our reconstruction would have been largely
determined by the tree ring data alone—no doubt, something that our critics would have jumped on us
for. So we used PCA to represent the dense networks of tree ring data in terms of smaller numbers of
representative patterns of variation in each region (North America, Eurasia, etc.).
We employed a standard, objective criterion for determining how many PCs should be kept for
each region. This criterion is known as a “selection rule,” and it is derived using the very same sorts
of Monte Carlo techniques I described in chapter 1 . One creates various surrogate datasets that in key
respects have the same attributes as the actual data (same size, same overall amplitude of variation,
etc.), but that are randomized in a way that destroys any significant structure in the data. By comparing
how much variation is resolved in the actual PCs of the data relative to the PCs of the randomized
Search WWH ::




Custom Search