Information Technology Reference
In-Depth Information
Fifteen Teams Measured the Same Website
In May 2009, 15 U.S. and European teams independently and simultaneously carried
outusabilitymeasurementsoftheBudget.comcarrentalwebsite.Thegoalswereto
investigate reproducibility of professional usability measurements and how experienced
professionals actually carry out usability measurements.
The measurements were based on a common scenario and instructions. The scenario
deliberately did not specify in detail which measures the teams were supposed to collect
and report, although participants were asked to collect time-on-task, task success,
and satisfaction data, as well as any qualitative data they normally would collect. The
anonymous reports from the 15 participating teams are available publicly online ( http://
www.dialogdesign.dk/CUE-8.htm ) .
Allteamswereaskedtomeasurethesamefivetasksintheirstudy,forexample,“Rentan
intermediatesizecaratLoganAirportinBoston,Massachusetts,fromThursday11June
2009 at 09.00 am toMonday15Juneat3.00 pm . If asked for a name, use John Smith,
email address john112233@hotmail.com .Donotsubmitthereservation.”
Teamsusedfrom9to313testparticipantsandfrom21to128hourstocompletethe
study. Interestingly, the team that tested the most participants also spent the fewest
hoursonthestudy.Thisteamused21personhourstoconduct313sessions,whichwere
all unmoderated.
Eight of the 15 teams used the SUS questionnaire for measuring subjective satisfaction.
Despiteitsknownshortcomings,SUSseemstobethecurrentindustrystandard.No
other questionnaire was used by more than one team.
Nineteamsincludedqualitativeresultsinadditiontotherequiredquantitativeresults.
The general feeling seemed to be that the qualitative results were a highly useful
by-product of the measurements.
ThestudyisnamedCUE-8.ItwastheeighthinaseriesofComparativeUsability
Evaluation studies ( http://www.dialogdesign.dk/CUE.html ) .
Unmoderated Test Sessions
Six teams used unmoderated, automated measurements. Two of these six teams
supplemented unmoderated measurements with moderated measurement sessions. These
teams obtained valuable results but some also found that their data from the unattended
test sessions were contaminated or invalid. Some participants reported impossible task
times, perhaps because they wanted the reward with as little effort as possible.
Examplesofcontaminateddataare33secondstorentacar,whichisimpossibleonthe
Budget.comwebsite.Thepresenceofobviouslycontaminateddatainthedatasetraises
serious doubts about the validity of all data in the data set. It's easy to spot unrealistic
data, but how about a reported time of, for example, 146 seconds to rent a car in a data
set that also contains unrealistic data? The 146 seconds look realistic, but how do you
know that the unmoderated test participant did not use an unacceptable approach to
arrive at the reported time?
Unmoderated measurements are attractive from a resource point of view; however, data
contamination is a serious problem and it is not always clear what you are actually
Search WWH ::




Custom Search