Methodology III: Empirical Evaluation - Foundations for Designing User-Centered Systems

Information Technology Reference

In-Depth Information

cannot directly assess the usability of an artifact using a single measure. In this

case you first have to operationalize the concept of usability, and then measure the

different dimensions separately to assess efficiency, effectiveness, satisfaction, and

so on. So although you are not measuring usability directly, by measuring the

separate dimensions you are improving your construct validity. At this stage you

will also need to think about the content validity.

Content validity refers to whether the content of a measure or instrument

corresponds to the content of the construct that the test was designed to cover.

Again, if we think about an IQ test, its content validity is determined by whether

the items in the test cover all the different areas of intelligence that are discussed in

the literature. For a usability survey, you would systematically examine the items

in the survey to make sure that you had covered all of the relevant aspects of

usability for the artifact that you are evaluating. So you might have items about the

display layouts, the content, and so on. Often the way that content validity is

evaluated is by having domain experts compare the test items against the speci-

fication for the thing that is being tested.

Face validity (also called surface validity) refers to whether a test appears to

measure a certain criterion. It is closely related to content validity. The main

difference is that you assess content validity by using a systematic review, whereas

you assess face validity by having people make judgments about the test simply

based on the surface appearance of the test. You could assess face validity, for

example, by asking somebody (it does not have to be an expert) what they think

the test is measuring. Sometimes you may get more honest answers if you have

lower face validity, because the people doing the test are focused more on the task

than what they think is being tested. It is also worth noting that you should not rely

on face validity alone, because even so-called experts can get it wrong (consider,

for example, the way they used to test whether someone was a witch or not in the

Middle Ages).

Note that a test may have poor face validity but good construct validity. A game

in which you shoot a gun at letters might not appear to measure spelling ability, for

example, unless the letters pop up in a pattern that is based on correct spelling.

Similarly, a tank simulation game might have poor surface validity for a naval

task. However, both of these situations might have good construct validity in that

the mental representations or the perceptual goals and cues are accurately repre-

sented (Smallman and St. John 2005 ). So you may have to consider how to trade

off face validity against construct validity (and content validity) when designing

your evaluation study.

Internal validity refers to how well conclusions can be drawn about cause-effect

(causal) relationships based on the study design, including the measures used, and

the situation in which the study was carried out. Internal validity is generally

highest for tightly controlled studies which investigate the effect of an independent

variable on a dependent variable, often run in a laboratory setting. To get good

internal validity you need to make sure you control for other effects that could

have an impact on the results you obtain. These include:

Foundations for Designing User-Centered Systems

Search WWH ::

Custom Search

Home