Information Technology Reference
In-Depth Information
cannot directly assess the usability of an artifact using a single measure. In this
case you first have to operationalize the concept of usability, and then measure the
different dimensions separately to assess efficiency, effectiveness, satisfaction, and
so on. So although you are not measuring usability directly, by measuring the
separate dimensions you are improving your construct validity. At this stage you
will also need to think about the content validity.
Content validity refers to whether the content of a measure or instrument
corresponds to the content of the construct that the test was designed to cover.
Again, if we think about an IQ test, its content validity is determined by whether
the items in the test cover all the different areas of intelligence that are discussed in
the literature. For a usability survey, you would systematically examine the items
in the survey to make sure that you had covered all of the relevant aspects of
usability for the artifact that you are evaluating. So you might have items about the
display layouts, the content, and so on. Often the way that content validity is
evaluated is by having domain experts compare the test items against the speci-
fication for the thing that is being tested.
Face validity (also called surface validity) refers to whether a test appears to
measure a certain criterion. It is closely related to content validity. The main
difference is that you assess content validity by using a systematic review, whereas
you assess face validity by having people make judgments about the test simply
based on the surface appearance of the test. You could assess face validity, for
example, by asking somebody (it does not have to be an expert) what they think
the test is measuring. Sometimes you may get more honest answers if you have
lower face validity, because the people doing the test are focused more on the task
than what they think is being tested. It is also worth noting that you should not rely
on face validity alone, because even so-called experts can get it wrong (consider,
for example, the way they used to test whether someone was a witch or not in the
Middle Ages).
Note that a test may have poor face validity but good construct validity. A game
in which you shoot a gun at letters might not appear to measure spelling ability, for
example, unless the letters pop up in a pattern that is based on correct spelling.
Similarly, a tank simulation game might have poor surface validity for a naval
task. However, both of these situations might have good construct validity in that
the mental representations or the perceptual goals and cues are accurately repre-
sented (Smallman and St. John 2005 ). So you may have to consider how to trade
off face validity against construct validity (and content validity) when designing
your evaluation study.
Internal validity refers to how well conclusions can be drawn about cause-effect
(causal) relationships based on the study design, including the measures used, and
the situation in which the study was carried out. Internal validity is generally
highest for tightly controlled studies which investigate the effect of an independent
variable on a dependent variable, often run in a laboratory setting. To get good
internal validity you need to make sure you control for other effects that could
have an impact on the results you obtain. These include:
Search WWH ::




Custom Search