Issue-Based Metrics - Measuring the User Experience: Collecting, Analyzing, and Presenting Usability Metrics

Information Technology Reference

In-Depth Information

Fifteen Teams Measured the Same Website

In May 2009, 15 U.S. and European teams independently and simultaneously carried

outusabilitymeasurementsoftheBudget.comcarrentalwebsite.Thegoalswereto

investigate reproducibility of professional usability measurements and how experienced

professionals actually carry out usability measurements.

The measurements were based on a common scenario and instructions. The scenario

deliberately did not specify in detail which measures the teams were supposed to collect

and report, although participants were asked to collect time-on-task, task success,

and satisfaction data, as well as any qualitative data they normally would collect. The

anonymous reports from the 15 participating teams are available publicly online ( http://

www.dialogdesign.dk/CUE-8.htm ) .

Allteamswereaskedtomeasurethesamefivetasksintheirstudy,forexample,“Rentan

intermediatesizecaratLoganAirportinBoston,Massachusetts,fromThursday11June

2009 at 09.00 am toMonday15Juneat3.00 pm . If asked for a name, use John Smith,

email address john112233@hotmail.com .Donotsubmitthereservation.”

Teamsusedfrom9to313testparticipantsandfrom21to128hourstocompletethe

study. Interestingly, the team that tested the most participants also spent the fewest

hoursonthestudy.Thisteamused21personhourstoconduct313sessions,whichwere

all unmoderated.

Eight of the 15 teams used the SUS questionnaire for measuring subjective satisfaction.

Despiteitsknownshortcomings,SUSseemstobethecurrentindustrystandard.No

other questionnaire was used by more than one team.

Nineteamsincludedqualitativeresultsinadditiontotherequiredquantitativeresults.

The general feeling seemed to be that the qualitative results were a highly useful

by-product of the measurements.

ThestudyisnamedCUE-8.ItwastheeighthinaseriesofComparativeUsability

Evaluation studies ( http://www.dialogdesign.dk/CUE.html ) .

Unmoderated Test Sessions

Six teams used unmoderated, automated measurements. Two of these six teams

supplemented unmoderated measurements with moderated measurement sessions. These

teams obtained valuable results but some also found that their data from the unattended

test sessions were contaminated or invalid. Some participants reported impossible task

times, perhaps because they wanted the reward with as little effort as possible.

Examplesofcontaminateddataare33secondstorentacar,whichisimpossibleonthe

Budget.comwebsite.Thepresenceofobviouslycontaminateddatainthedatasetraises

serious doubts about the validity of all data in the data set. It's easy to spot unrealistic

data, but how about a reported time of, for example, 146 seconds to rent a car in a data

set that also contains unrealistic data? The 146 seconds look realistic, but how do you

know that the unmoderated test participant did not use an unacceptable approach to

arrive at the reported time?

Unmoderated measurements are attractive from a resource point of view; however, data

contamination is a serious problem and it is not always clear what you are actually

Search WWH ::

Custom Search

Home