Pass or fail? Binomialrelated hypothesis testing and confidence intervals using independent samples - Improving the User Experience through Practical Data Analytics

Database Reference

In-Depth Information

CHAPTER

4

Pass or fail? Binomial-

related hypothesis testing

and conidence intervals

using independent samples

4.1 INTRODUCTION

Task success is a fundamental metric for any UX researcher who regularly conducts

usability testing. After all, the sine qua non of usability testing is tasks. Without

them, you don't have a usability test. You painstakingly write them, tweak them, meet

about them, get feedback from your colleagues on them, argue about them, try to get

consensus, tweak them some more…all the way up until test time. And depending on

what happens during your pilot test, they can change again.

And for good reason. Although there are lots of variables that go into creating a

usability test that will yield good results, the single most important variable is the

quality of the tasks. Clear, incisive ones yield meaningful, actionable data; labby,

ambiguous tasks yield garbage. Writing good tasks is one of the most important

things you do to prepare for a usability test, hands down. So, it's easy to see that

pass/fail is a fundamental metric that you should always deliver for each task. As a

matter of fact, the task completion tally should probably go into your executive sum-

mary or near the top of your presentation.

SIDEBAR: YOUR CURE FOR A.D.D.!

The audience for your usability test presentation will always sit up and take notice when you pres-

ent the task completion rates, even if they “multitask” and nod off for the rest of your presentation.

It's universally understood and carries immediate impact. You can spend a thousand words describ-

ing a particular usability problem or application until you're blue in the face (as our mothers might

say), and still get more audience reaction by saying “only one out of eight participants completed

the task.” In the world of today, where it is harder and harder to get and maintain a person's atten-

tion, it's often the only thing an audience member will take away from your presentation.

And as you're probably aware, no usability test consists of just one task. (Well, not in our experience,

anyway.) You typically have anywhere from 5 to 15 of them, depending on what you're testing, what

you're trying to ind out, how much time you have for the test, and how much time you're willing to spend

on each task. In our experience, 10 tasks is about the average number of tasks for a 1-hour usability test.

So, multiple tasks mean multiple task success rates, which naturally lead to comparisons. After

all, your design-and-development team probably wants to ix the most egregious problems while

leaving the less severe problems to ix in a future release. But, how do you discern, for example,

whether the result of 7 failures out of 10 for a particular task are really more severe than the result

of 5 failures out of 10 for another task? Read on!

Search WWH ::

Custom Search

Home