Database Reference
In-Depth Information
As we noted, the traditional cutoff point for “very small chance of occurring” is
0.05. Admittedly, this is an arbitrarily value (see sidebar below), but is it a standard
that is used virtually all the time by statisticians. Think of it this way: if any event
occurred only 5 out of 100 times, that's a 5% probability. Pretty small indeed. So, the
p -value is a measure of the strength of your evidence against H0. Put another way, the
null hypothesis fails to account for the “facts” of the case, and those “facts” are the
data you collected during your survey, usability test or any other method you used.
SIDEBAR: THE p -VALUE CUTOFF POINT AND THE TEA-TASTING LADY
As we noted, the traditional cutoff point for “a small chance of occurring” is 0.05. This is an arbi-
trary value, but it is used virtually all the time. It would run contrary to tradition to use a different
value, although, of course, you can.
Nevertheless, to a large extent, the reason for using 0.05 is based on an off-the-cuff remark
made by a famous statistician at a tea party at a university in England in the early/mid-1900s.
We're speaking of Sir Ronald Fisher (February 1890-July 1962), arguably the most prominent
statistician of the twentieth century. He was a bit of a recluse, who to a larger than usual extent, did not
“suffer fools,” and while not especially well liked, was universally acknowledged as brilliant. Part of
his obituary, written by statistician Kendall (1963) in the journal, Biometrica , stated, “In character he
was, let us admit it, a dificult man.” He sometimes wrote pieces on statistical thinking that were hard
to understand by other statisticians of the day. Still, his brilliance often shone through.
He wrote a topic that began by telling a story. The story was this: a woman entered his ofice
one day and said that by tasting a cup of tea, she could tell if the tea was put into the cup irst or the
milk was put into the cup irst. Fisher built up a complete text, primarily dealing with the ideas of
hypothesis testing, based on this example.
How many cups of tea need the woman taste correctly before we should believe her claim?
Should there be the same number of cups of tea of each type? What if she got 90% (not 100%) of
the assessments right? 90% is, of course, less than perfection, but is so much above 50%, that it is
very unlikely to occur if she really had no knowledge of the process of forming the tea based on the
taste ( Fisher, 1935 ).
Fisher never formally revealed if the lady was correct in her assertion or not, but it is
reported in a topic by Fisher's daughter ( Box, 1978 ) that the woman did identify 8 cups of tea
correctly, 4 with the tea having been put in irst and 4 with the milk having been put in irst.
This can only happen by chance (assuming that the woman had no discerning skill and knew
the (4, 4) split existed) with a probability of about 0.014 (1.4%). This, in effect, is our p -value
in this experiment.
In any event, the story goes that at a tea party at a university in England, Fisher was walking
by a group of people, in a hurry, when one asked him how low a probability is “too low” to retain
credibility, and Fisher kind of waved the question away as he continued his walk without pause (in
his all-too-often rude manner) and said something like, “Oh! About 1 in 20!!” Fisher acknowledged
this “0.05” value more formally, when he was quoted saying, “If the p -value is between 0.1 and
0.9, there is certainly no reason to suspect the hypothesis tested. If it is below 0.02, it is strongly
indicated that the hypothesis fails to account for the whole of the facts. We shall not often be astray
if we draw a conventional line at 0.05 .”
Now, let's go back to our design example and note how the p -value makes the
process so simple. But this time, let's assume the new design garnered a mean satis-
faction rating of 4.50. Again, for reference, our beloved hypotheses:
Search WWH ::




Custom Search