Database Reference
In-Depth Information
CHAPTER
3
Comparing two designs
(or anything else!) using
paired sample T-tests
3.1 INTRODUCTION
So, how do you feel, UX analytics guru? Did you blow 'em away with your stats
prowess? How many impressed looks did you get when you started to talk “ p -values?”
Well, don't get too cocky yet. The scenario we introduced with Mademoiselle La La
in the previous chapter was pretty straightforward. You just launched a survey with two
different designs to two different groups and just sat back to see which one would win.
The reality is that you often don't have the luxury of obtaining even the moder-
ately small sample sizes illustrated in the previous chapter. Why? Because much of
your job invariably revolves around conducting good, old-fashioned usability tests.
Standard usability tests are usually conducted with small samples sizes of 5-10
because (1) it's been established that larger samples sizes do not reveal more prob-
lems, and (2) conducting traditional lab studies with large populations is both time
consuming and expensive.
SIDEBAR: SAMPLE SIZE: HOW MANY PARTICIPANTS DO YOU NEED
FOR A USABILITY TEST?
One of the most contentious issues in the usability-testing ield has been the appropriate sample
size of participants needed to produce credible results. “How many participants are enough?” is
an enduring question for practitioners, who often follow their intuition instead of relying on the
research. They can hardly be blamed, since the research is sometimes contradictory. As a key deci-
sion that needs to be made before recruiting for the test, the sample size debate only muddies the
waters when assessing the reliability of the practice of usability testing.
Virzi (1992) , Nielsen and Landauer (1993) , and Lewis (1994) have published inluential articles
on the topic of sample size in usability testing. In these articles, the authors presented a mathemati-
cal model for determining the sample size for usability tests. The authors presented empirical
evidence for the models and made several important claims:
• Most usability problems are detected with three to ive subjects.
• Running additional subjects during the same test is unlikely to reveal new information.
• Most severe usability problems are detected by the irst few subjects. However, this claim is sup-
ported by Virzi's data—but not supported by Lewis' data, or Law and Hvannberg's data (2004 ).
Virzi's stated goal was to improve return on investment in product development by reducing
the time and cost involved in product design. Nielsen and Landauer (1993) replicated and extended
Virzi's ( 1992 ) original indings and reported case studies that supported their claims for needing
only small sample sizes for usability tests.
Continued
 
Search WWH ::




Custom Search