Comparing two designs (or anything else!) using paired sample T-tests - Improving the User Experience through Practical Data Analytics

Database Reference

In-Depth Information

CHAPTER

3

Comparing two designs

(or anything else!) using

paired sample T-tests

3.1 INTRODUCTION

So, how do you feel, UX analytics guru? Did you blow 'em away with your stats

prowess? How many impressed looks did you get when you started to talk “ p -values?”

Well, don't get too cocky yet. The scenario we introduced with Mademoiselle La La

in the previous chapter was pretty straightforward. You just launched a survey with two

different designs to two different groups and just sat back to see which one would win.

The reality is that you often don't have the luxury of obtaining even the moder-

ately small sample sizes illustrated in the previous chapter. Why? Because much of

your job invariably revolves around conducting good, old-fashioned usability tests.

Standard usability tests are usually conducted with small samples sizes of 5-10

because (1) it's been established that larger samples sizes do not reveal more prob-

lems, and (2) conducting traditional lab studies with large populations is both time

consuming and expensive.

SIDEBAR: SAMPLE SIZE: HOW MANY PARTICIPANTS DO YOU NEED

FOR A USABILITY TEST?

One of the most contentious issues in the usability-testing ield has been the appropriate sample

size of participants needed to produce credible results. “How many participants are enough?” is

an enduring question for practitioners, who often follow their intuition instead of relying on the

research. They can hardly be blamed, since the research is sometimes contradictory. As a key deci-

sion that needs to be made before recruiting for the test, the sample size debate only muddies the

waters when assessing the reliability of the practice of usability testing.

Virzi (1992) , Nielsen and Landauer (1993) , and Lewis (1994) have published inluential articles

on the topic of sample size in usability testing. In these articles, the authors presented a mathemati-

cal model for determining the sample size for usability tests. The authors presented empirical

evidence for the models and made several important claims:

• Most usability problems are detected with three to ive subjects.

• Running additional subjects during the same test is unlikely to reveal new information.

• Most severe usability problems are detected by the irst few subjects. However, this claim is sup-

ported by Virzi's data—but not supported by Lewis' data, or Law and Hvannberg's data (2004 ).

Virzi's stated goal was to improve return on investment in product development by reducing

the time and cost involved in product design. Nielsen and Landauer (1993) replicated and extended

Virzi's ( 1992 ) original indings and reported case studies that supported their claims for needing

only small sample sizes for usability tests.

Continued

Search WWH ::

Custom Search

Home