Information Technology Reference
In-Depth Information
of predefined benchmark tasks. Defining a set of appropriate summative measures
remains a difficult problem to solve, however, because systems, unlike many
manufactured goods, cannot be assessed against tight specifications and tolerances.
13.2.5 Validity, Reliability, and Sensitivity
When it comes to designing an evaluation study, you want to make sure that you
are evaluating the right thing, that you can measure the effects that you are looking
for, and that the results can be generalized to other situations. To achieve these
goals, you will need to think about the issues of validity, reliability, and sensitivity.
This is true irrespective of whether you are collecting qualitative or quantitative
data.
13.2.5.1 Validity
Validity refers to whether the measure that you are using is really measuring what
it is supposed to be measuring. Reliability, on the other hand, refers to the con-
sistency of a measure across different conditions. Note that it is possible to use a
measure that has validity, but is not reliable, and vice versa. You should be aiming
for high degrees of validity and reliability.
In addition to validity and reliability, you will also need to consider the sen-
sitivity of the measure that you are using: does it react sufficiently well to changes
to the independent variable. Validity, reliability, and sensitivity will all differ,
depending on the context in which you are doing the evaluation.
There are several types of validity that you will need to think about. Here they
are classified into two basic types:
• Instrument validity which relates to the instruments or measures that you will
use in your evaluation. There are three subtypes: construct validity, content
validity, and face validity.
• Experimental validity which relates the generalizability of the results. There are
three subtypes: internal validity, external validity, and ecological validity.
We discuss each of these in more detail below as well as explaining the trade-
offs that you may need to make when deciding on which type of experimental
validity is important to your evaluation study.
Construct validity refers to the extent that your instrument or measure really
does measure what you think it does. Probably the simplest example is to think
about an IQ test as a whole, and how much it actually measures intelligence.
Supporting evidence usually comes from both theory and testing, and can include
statistical analysis of how responses and test items are related. If you think about
usability, and how you measure that, things start to get a bit trickier because there
are several different dimensions to the concept of usability. In other words, you
Search WWH ::




Custom Search