Comparing two designs (or anything else!) using independent sample T-tests - Improving the User Experience through Practical Data Analytics

Database Reference

In-Depth Information

2.5 MADEMOISELLE LA LA REDUX

Armed with this knowledge (and this topic!) you're ready to jump into action as the

UX researcher to determine the winner through a t-test with independent samples.

Now, there are lots of ways to collect this data, but probably the most economical and

most eficient way to collect the data is using online surveys.

An online survey is one of the easiest ways to collect attitudinal data from your tar-

get audience. Typically, surveys contain some combination of open-ended comments,

binary yes/no responses, Likert-type rating scales, Net Promoter scales, and more.

In the case of Mademoiselle La La, you'll probably want to construct two different

surveys. Each is identical except for the design shown. One will collect basic demo-

graphic data (gender, age, income, etc.) and then go on to reveal a new design. The sur-

vey will then probe on several different variables: organization of the page, aesthetics,

whether the page evokes attributes (in this case, sophistication), and then rate agree-

ment with some kind of bottom line question, like “This home page is sophisticated.”

Now, let's look speciically at using Excel or SPSS to perform the independent

samples t-test on the data collected from our ictional survey.

SIDEBAR: YOU DON'T NEED THE SAME NUMBER OF PARTICIPANTS

IN EACH GROUP

Before we dive in, it's important to note that we do NOT need to have the same number of people

evaluating each design, although if one were allocating people to each design, it would be most

eficient to split the folks in half, or as close as you can get to a 50/50 split.

SIDEBAR: LOW SAMPLE SIZE MAY MEAN LOW POWER

We can use the same hypothesis test framework, and ix the signiicance level at whatever you wish

(usually, 0.05), regardless of the sample size; this controls the probability of rejecting H0 when it is

true (called a “type 1 error”)—one way an incorrect conclusion can be reached. However, there is

another way of reaching an incorrect conclusion, and that is to accept H0, when, indeed, it is false.

This is called a “type 2 error.” The probability of this happening is one we usually do not control,

and cannot even compute unless we pick some very speciic H1 scenario that would typically be

arbitrary (after all, if we don't even know for sure if the means are equal or unequal, it is very

unlikely we would ever know exactly how unequal they are if they are unequal!!). The probability

of this type of accepting H0 when it is false decreases as the sample size increases (ceteris paribus).

The complement of this probability of incorrect conclusion, which would be the probability of

rejecting H0 when it is false (a good thing!), is called the power of the (hypothesis) test. With a

small sample size, the power of the test is often smaller than you might like (and correspondingly,

the probability of accepting H0 when it is false, the type 2 error probability, is higher than one

might like it to be). Of course, it is dificult to quantify this notion of power when we cannot sen-

sibly determine the actual probabilities. Nevertheless, we repeat the key point of the sidebar. If the

sample size is small, it is possible that, although you can still control (say, at 0.05) the probability of

rejecting H0 when it is true, you may be conducting a hypothesis test with a low amount of power.

Search WWH ::

Custom Search

Home