Database Reference
In-Depth Information
Table 4.4 Results of Fisher Exact Test for Each Task in Table 4.1 , Comparing
Novix and Behemoth Search Engines
TASK
Novix—pass
Behemoth—pass
p -value
1
3
9
0.020
2
2
7
Not signiicant, but
“close” = 0.07
3
1
8
0.005
4
5
6
Not signiicant > 0.500
5
7
8
Not signiicant > 0.500
6
7
9
Not signiicant > 0.500
7
1
9
0.011
To complete comparisons of all the tasks in your test, you would simply use
the aforementioned process in either Excel or SPSS for all seven tasks. The results
are summarized in Table 4.4 , which shows the p -value when we compare each task
between the Novix search engine and the existing Behemoth search engine. Remem-
ber, any p -value below 0.05 is considered statistically signiicant, and leads to a con-
clusion that, beyond a reasonable doubt, the pass rates are different. (By the way, the
p -values you see in Table 4.4 for tasks 4-6 are not typos. When the p -value is way
above .05, it is common to note a higher low limit to give the proper signiicance
impression to the reader. We could have easily used .05 instead of .500 to indicate
signiicance, but .500 tells the user that the different completion proportions for these
tasks were nowhere near signiicant.)
If you did the chi-square test, instead of the Fisher exact test (presumably, because
you were using Excel and did not have access to SPSS), the results would have been
nearly the same. The only material change would be for task 2, where, instead of the
0.07 (as noted, “close,” but, technically, not signiicant—using the Fisher exact test),
the chi-square test gives a p -value of 0.025, which is signiicant. So, if you were using
Excel and the chi-square test, you would reject H0 and conclude the pass rates differ.
However, the “right answer” is 0.07, but since this is so close to 0.05, getting that result
would likely lead to reserving judgment on task 2, and just viewing it as borderline one
way or the other, thus, not dramatically different from a rejection of H0. As we said a
little while ago, the lack of Fisher's exact test in Excel is a shortcoming of Excel.
4.4 MEANWHILE, BACK AT BEHEMOTH.COM
Now that you have the hard, cold facts about “Turbo Search,” you sure wish you'd got-
ten a shot at it before Behemoth spent $80 million making the Palo Alto techno-hipsters
rich beyond their wildest dreams. Now, you have to break the news to Hans Blitz.
You write up your indings as quickly as possible. But instead of calling your typi-
cal presentation meeting with the entire UX team, you decide you should give Hans a
“heads up.” You catch him at the espresso machine, and hand him the report. Sipping
his iced doppio , he skims the high-level indings. His brow furrows in disbelief:
“Are you frickin' kidding me? This is horrifying! The current search engine is
doing better, and we spent 80 mil!”
Search WWH ::




Custom Search