Self-Reported Metrics - Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics

Information Technology Reference

In-Depth Information

research), we use top-box scores in some situations. (It's always important to

understand who you're presenting your results to.)

HOW DO YOU CALCULATE CONFIDENCE INTERVALS FOR

TOP-BOX SCORES?

If you're calculating means of ratings, then you can calculate confidence intervals in the

same way you do for any other continuous data: using the “=CONFIDENCE” function

in Excel. But if you're calculating top-box or top-2-box scores, it's not so simple. When

you calculate a top-box or top-2-box value for each rating, you're turning it into binary

data: each rating is either a top-box value (or top-2-box value) or it's not. This is obvious

from Figure 6.1 , where each of the top-box (or top-2-box) values is either a “0” or a “1”.

This should ring some mental bells: it's like the task success data that we examined in

Chapter 4. When dealing with binary data, confidence intervals need to be calculated

using the Adjusted Wald Method. See Chapter 4 for details.

6.3 POST-TASK RATINGS

The main goal of ratings associated with each task is to give you some insight

into which tasks the participants thought were the most difficult. This can then

point you toward parts of the system or aspects of the product that need improve-

ment. One way to capture this information is to ask the participant to rate each

task on one or more scales. The next few sections examine some of the specific

techniques that have been used. For example, the data shown in Figure 6.2 show

that users of the Obama site rated Task 3 as the most difficult, while users of the

McCain site rated Task 2 as the most difficult.

6.3.1 Ease of Use

Probably the most common rating scale involves simply asking users to rate

how easy or how difficult each task was. This typically involves asking them to

rate the task using a five- or seven-point scale. Some UX professionals prefer

to use a traditional Likert scale, such as “This task was easy to complete” (1 =

Strongly Disagree, 3 = Neither Agree nor Disagree, 5 = Strongly Agree). Others

prefer to use a semantic differential technique with anchor terms such as “Easy/

Difficult.” Either technique will provide you with a measure of perceived usabil-

ity on a task level. Sauro and Dumas (2009) tested a single seven-point rating

scale, which they coined the “Single Ease Question”:

Overall, this task was?

Very Difficult

o

Very Easy

They compared it to several other post-task ratings and found it to be among the

most effective.

Search WWH ::

Custom Search

Home