Information Technology Reference
In-Depth Information
(similar to ASQ described in Chapter 6). Note that all of their analyses are at the
task level, whereas the previous sections have described analyses at the “usabil-
ity test” level. At the task level, task completion is typically a binary variable for
each participant: that person either completed the task successfully or did not.
At the usability-test level, task completion, as shown in previous sections, indi-
cates how many tasks each person completed, and it can be expressed as a per-
centage for each participant.
Sauro and Kindlund used techniques derived from Six Sigma methodology
(e.g., Breyfogle, 1999) to standardize their four usability metrics (task comple-
tion, time, errors, and task rating) into a SUM. Conceptually, their techniques
are not that different from the z score and percentage transformations described
in the previous sections. In addition, they used Principal Components Analysis,
a statistical technique that looks at correlations between variables, to determine
if all four of their metrics were contributing significantly to the overall calcula-
tion of the single metric. They found that all four were significant and, in fact,
that each contributed about equally. Consequently, they decided that each of
the four metrics (once standardized) should contribute equally to the calcula-
tion of the SUM score.
An online tool for entering data from a usability test and calculating the SUM
score is available from Jeff Sauro's “Usability Scorecard” website at http://www.
usabilityscorecard.com/ . For each task and each participant in the usability test,
you must enter the following:
Whethertheparticipantcompletedthetasksuccessfully(0or1).
Numberoferrorscommittedonthattaskbythatparticipant.(Youalso
specify the number of error opportunities for each task.)
Tasktimeinsecondsforthatparticipant.
Post-tasksatisfactionrating,whichisanaverageofthreepost-taskratings
on five-point scales of task ease, satisfaction, and perceived time—simi-
lar to ASQ.
After entering these data for all the tasks, the tool standardizes the scores and
calculates the SUM score for each task. Standardized data shown for each task
are illustrated in Table 8.8 . Note that a SUM score is calculated for each task,
which allows for overall comparisons of tasks. In these sample data, participants
did best on the “Cancel reservation” task and worst on the “Check restaurant
hours” task. An overall SUM score, 68% in this example, is also calculated, as is
a 90% confidence interval (53 to 88%), which is the average of the confidence
intervals of the SUM score for each task.
Search WWH ::




Custom Search