Methodology III: Empirical Evaluation - Foundations for Designing User-Centered Systems

Information Technology Reference

In-Depth Information

of predefined benchmark tasks. Defining a set of appropriate summative measures

remains a difficult problem to solve, however, because systems, unlike many

manufactured goods, cannot be assessed against tight specifications and tolerances.

13.2.5 Validity, Reliability, and Sensitivity

When it comes to designing an evaluation study, you want to make sure that you

are evaluating the right thing, that you can measure the effects that you are looking

for, and that the results can be generalized to other situations. To achieve these

goals, you will need to think about the issues of validity, reliability, and sensitivity.

This is true irrespective of whether you are collecting qualitative or quantitative

data.

13.2.5.1 Validity

Validity refers to whether the measure that you are using is really measuring what

it is supposed to be measuring. Reliability, on the other hand, refers to the con-

sistency of a measure across different conditions. Note that it is possible to use a

measure that has validity, but is not reliable, and vice versa. You should be aiming

for high degrees of validity and reliability.

In addition to validity and reliability, you will also need to consider the sen-

sitivity of the measure that you are using: does it react sufficiently well to changes

to the independent variable. Validity, reliability, and sensitivity will all differ,

depending on the context in which you are doing the evaluation.

There are several types of validity that you will need to think about. Here they

are classified into two basic types:

• Instrument validity which relates to the instruments or measures that you will

use in your evaluation. There are three subtypes: construct validity, content

validity, and face validity.

• Experimental validity which relates the generalizability of the results. There are

three subtypes: internal validity, external validity, and ecological validity.

We discuss each of these in more detail below as well as explaining the trade-

offs that you may need to make when deciding on which type of experimental

validity is important to your evaluation study.

Construct validity refers to the extent that your instrument or measure really

does measure what you think it does. Probably the simplest example is to think

about an IQ test as a whole, and how much it actually measures intelligence.

Supporting evidence usually comes from both theory and testing, and can include

statistical analysis of how responses and test items are related. If you think about

usability, and how you measure that, things start to get a bit trickier because there

are several different dimensions to the concept of usability. In other words, you

Search WWH ::

Custom Search

Home