Geoscience Reference
In-Depth Information
data has an associated RE score. We set a benchmark based on the idea that, if the proxy
data was actually informative about the real world it had to yield a higher RE score than
mostofthe(uninformative)artificialdata.Mannhaddonethesamething,buthadnottaken
into account the effect of the erroneous PC method. The real proxy data didn't turn out to
be more informative than red noise, but he set his benchmark too low, making his proxy
results look statistically significant when in reality they weren't.
There was a big red flag in his calculations that should have tipped him off. Another
model test is called the r 2 score. It has the nice feature that you don't need to do Monte
Carlo simulations, it has standard benchmark tables available in any statistics textbook. 10
While Mann reported the (favourable) r 2 scores for the later portion of his graph, 11
he didn't mention them for the early portion (pre-1750), where they were nearly zero,
indicating a lack of statistical significance. Instead he only reported the RE score, which
he thought indicated significance. He showed the reader the RE test that he thought
(incorrectly) was favourable, yet he kept referring to significance tests in the plural in
supportofhisclaims,sothereaderwouldnaturallyassumetheunreported r 2 scoreslooked
good too. 12
They didn't, but he failed to report that in the article. And as we later showed, the
r 2 and RE scores were actually saying the same thing, namely that the hockey stick was
uninformative as an indicator of past temperatures.
Stickhandling
In2005,followinganarticleonthedisputein The Wall Street Journal ,Mannhadbeensent
a list of questions by the Energy and Commerce Committee of the US Congress, one of
which was whether he had computed the r 2 score. His answer was:
My colleagues and I did not rely on this statistic in our assessments of “skill” (i.e., the reliability of a statistical model,
based on the ability of a statistical model to match data not used in constructing the model) because, in our view, and in
the view of other reputable scientists in the field, it is not an adequate measure of “skill.” The statistic used by Mann et
al. 1998, the reduction of error, or “RE” statistic, is generally favored by scientists in the field. 13
The answer is classic misdirection. He was not asked: 'Did you rely on the r 2 score when
assessingyourresults?'Therewasnoneedtoaskthat:ifhe had reliedonithewouldnever
have claimed his results were significant. He only claimed significance by ignoring it. The
question specifically was whether he computed r 2 . Tellingly, in his reply he changed the
subject. But it hardly matters. Either he did not compute it, in which case he was lying in
Search WWH ::




Custom Search