Information Technology Reference
In-Depth Information
research), we use top-box scores in some situations. (It's always important to
understand who you're presenting your results to.)
HOW DO YOU CALCULATE CONFIDENCE INTERVALS FOR
TOP-BOX SCORES?
If you're calculating means of ratings, then you can calculate confidence intervals in the
same way you do for any other continuous data: using the “=CONFIDENCE” function
in Excel. But if you're calculating top-box or top-2-box scores, it's not so simple. When
you calculate a top-box or top-2-box value for each rating, you're turning it into binary
data: each rating is either a top-box value (or top-2-box value) or it's not. This is obvious
from Figure 6.1 , where each of the top-box (or top-2-box) values is either a “0” or a “1”.
This should ring some mental bells: it's like the task success data that we examined in
Chapter 4. When dealing with binary data, confidence intervals need to be calculated
using the Adjusted Wald Method. See Chapter 4 for details.
6.3 POST-TASK RATINGS
The main goal of ratings associated with each task is to give you some insight
into which tasks the participants thought were the most difficult. This can then
point you toward parts of the system or aspects of the product that need improve-
ment. One way to capture this information is to ask the participant to rate each
task on one or more scales. The next few sections examine some of the specific
techniques that have been used. For example, the data shown in Figure 6.2 show
that users of the Obama site rated Task 3 as the most difficult, while users of the
McCain site rated Task 2 as the most difficult.
6.3.1 Ease of Use
Probably the most common rating scale involves simply asking users to rate
how easy or how difficult each task was. This typically involves asking them to
rate the task using a five- or seven-point scale. Some UX professionals prefer
to use a traditional Likert scale, such as “This task was easy to complete” (1 =
Strongly Disagree, 3 = Neither Agree nor Disagree, 5 = Strongly Agree). Others
prefer to use a semantic differential technique with anchor terms such as “Easy/
Difficult.” Either technique will provide you with a measure of perceived usabil-
ity on a task level. Sauro and Dumas (2009) tested a single seven-point rating
scale, which they coined the “Single Ease Question”:
Overall, this task was?
Very Difficult
o
o
o
o
o
o
o
Very Easy
They compared it to several other post-task ratings and found it to be among the
most effective.
Search WWH ::




Custom Search