Geoscience Reference
In-Depth Information
surrogate data, 18 one obtains an objective answer to what we have already seen to be the crucial
question: How many significant PCs are there in the actual data? Such a criterion might indicate that
one needs to keep the PCs that resolve the leading 50 percent of variation in the data, but one could
alternatively find that as much as 90 percent or as little as 10 percent of the variation in the data must
be retained. The precise answer will depend on the characteristics of the data at hand.
For this procedure to be valid, the random datasets must be treated with precisely the same
statistical conventions as the original data. This issue is a nontrivial one because, as we have seen,
there are different possible conventions for how the data might be centered and, as we have also seen,
this choice can play a crucial role in determining the relative ordering of the various patterns in the
data. We chose to use the same twentieth-century base period we had used for the instrumental data
for centering the proxy data when performing the PCA step—a modern-centering convention. 19 The
same convention was, therefore, as it needs to be, also used for all of our random surrogates in
determining how many PCs to keep.
The North American tree ring data, as previously described, played a particularly important role
in our analysis. Applying our selection rule to these data, using a modern centering convention
indicated that the leading two PC series should be retained. PC#1 emphasized the tree ring data from
high-elevation sites in the western United States, which, as discussed in chapter 4 , contained a key
long-term temperature signal, the hockey stick signature of a cold Little Ice Age interval followed by
pronounced twentieth-century warming. PC#2 emphasized lower-elevation tree ring series, which
showed less of a twentieth-century trend.
McIntyre and McKitrick used a different PCA convention in their 2005 paper. They centered the
tree ring data over the long term (1400-1980). That's fine—in fact, long-term centering is actually the
traditional convention, and given the fodder our less traditional modern centering convention has
provided for climate change deniers, I wish we'd used the long-term centering from the start. It
doesn't make any difference which convention you use; you get the same final answer in the
procedure as long as you do the analysis correctly.
McIntyre and McKitrick got a dramatically different answer by not doing the analysis correctly.
Their error is easy to understand using our synthetic PCA example. As we saw above, using a
modern-centering convention (centering over the final fifty of the one hundred years in the example),
the global warming pattern was carried by PC#1, describing 55 percent of the variation in the data,
while PC#2—the oscillation pattern—described the remaining 45 percent. Let's imagine that our
selection rules told us that we should retain any PC explaining at least 45 percent of the total
variation in the data. We would end up keeping both patterns, PC#1 and PC#2, resolving all of the
important information in the dataset (the long-term trend and the oscillation).
Suppose instead that we used that same retention criterion (PCs resolving at least 45 percent of
the variation in the data) that had been derived for the modern centering, and misapplied it to PCA
results where the alternative, long-term centering (centering the data over the entire hundred years)
had been used. Since the global warming pattern in that case showed up as PC#2, explaining only 40
percent of the variation in the data, it ends up on the cutting room floor, just missing our threshold for
retention. By misapplying a selection rule derived for one convention (modern centering) to PCA
results based on a different convention (long-term centering), we would end up erroneously throwing
out the proverbial baby with the bathwater.
That's precisely what McIntyre and McKitrick did with the North American tree ring data. They
 
 
Search WWH ::




Custom Search