Database Reference
In-Depth Information
CHAPTER
4
Pass or fail? Binomial-
related hypothesis testing
and conidence intervals
using independent samples
4.1 INTRODUCTION
Task success is a fundamental metric for any UX researcher who regularly conducts
usability testing. After all, the sine qua non of usability testing is tasks. Without
them, you don't have a usability test. You painstakingly write them, tweak them, meet
about them, get feedback from your colleagues on them, argue about them, try to get
consensus, tweak them some more…all the way up until test time. And depending on
what happens during your pilot test, they can change again.
And for good reason. Although there are lots of variables that go into creating a
usability test that will yield good results, the single most important variable is the
quality of the tasks. Clear, incisive ones yield meaningful, actionable data; labby,
ambiguous tasks yield garbage. Writing good tasks is one of the most important
things you do to prepare for a usability test, hands down. So, it's easy to see that
pass/fail is a fundamental metric that you should always deliver for each task. As a
matter of fact, the task completion tally should probably go into your executive sum-
mary or near the top of your presentation.
SIDEBAR: YOUR CURE FOR A.D.D.!
The audience for your usability test presentation will always sit up and take notice when you pres-
ent the task completion rates, even if they “multitask” and nod off for the rest of your presentation.
It's universally understood and carries immediate impact. You can spend a thousand words describ-
ing a particular usability problem or application until you're blue in the face (as our mothers might
say), and still get more audience reaction by saying “only one out of eight participants completed
the task.” In the world of today, where it is harder and harder to get and maintain a person's atten-
tion, it's often the only thing an audience member will take away from your presentation.
And as you're probably aware, no usability test consists of just one task. (Well, not in our experience,
anyway.) You typically have anywhere from 5 to 15 of them, depending on what you're testing, what
you're trying to ind out, how much time you have for the test, and how much time you're willing to spend
on each task. In our experience, 10 tasks is about the average number of tasks for a 1-hour usability test.
So, multiple tasks mean multiple task success rates, which naturally lead to comparisons. After
all, your design-and-development team probably wants to ix the most egregious problems while
leaving the less severe problems to ix in a future release. But, how do you discern, for example,
whether the result of 7 failures out of 10 for a particular task are really more severe than the result
of 5 failures out of 10 for another task? Read on!
 
Search WWH ::




Custom Search