The hockey stick: a retrospective - Climate Change: The Facts

Geoscience Reference

In-Depth Information

data has an associated RE score. We set a benchmark based on the idea that, if the proxy

data was actually informative about the real world it had to yield a higher RE score than

mostofthe(uninformative)artificialdata.Mannhaddonethesamething,buthadnottaken

into account the effect of the erroneous PC method. The real proxy data didn't turn out to

be more informative than red noise, but he set his benchmark too low, making his proxy

results look statistically significant when in reality they weren't.

There was a big red flag in his calculations that should have tipped him off. Another

model test is called the r 2 score. It has the nice feature that you don't need to do Monte

Carlo simulations, it has standard benchmark tables available in any statistics textbook. 10

While Mann reported the (favourable) r 2 scores for the later portion of his graph, 11

he didn't mention them for the early portion (pre-1750), where they were nearly zero,

indicating a lack of statistical significance. Instead he only reported the RE score, which

he thought indicated significance. He showed the reader the RE test that he thought

(incorrectly) was favourable, yet he kept referring to significance tests in the plural in

supportofhisclaims,sothereaderwouldnaturallyassumetheunreported r 2 scoreslooked

good too. 12

They didn't, but he failed to report that in the article. And as we later showed, the

r 2 and RE scores were actually saying the same thing, namely that the hockey stick was

uninformative as an indicator of past temperatures.

Stickhandling

In2005,followinganarticleonthedisputein The Wall Street Journal ,Mannhadbeensent

a list of questions by the Energy and Commerce Committee of the US Congress, one of

which was whether he had computed the r 2 score. His answer was:

My colleagues and I did not rely on this statistic in our assessments of “skill” (i.e., the reliability of a statistical model,

based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in

the view of other reputable scientists in the field, it is not an adequate measure of “skill.” The statistic used by Mann et

al. 1998, the reduction of error, or “RE” statistic, is generally favored by scientists in the field. 13

The answer is classic misdirection. He was not asked: 'Did you rely on the r 2 score when

assessingyourresults?'Therewasnoneedtoaskthat:ifhe had reliedonithewouldnever

have claimed his results were significant. He only claimed significance by ignoring it. The

question specifically was whether he computed r 2 . Tellingly, in his reply he changed the

subject. But it hardly matters. Either he did not compute it, in which case he was lying in

Search WWH ::

Custom Search

Home