Biomedical Engineering Reference
In-Depth Information
a new information resource, but the users require 20 minutes to complete
an assigned task.
4. Inconsistent conditions: Unless multiple judges make their observa-
tions simultaneously, the phenomena observed can vary from judge to
judge. One judge may observe completion of a task in the morning, when
the persons completing the task are alert and energetic, while another judge
may observe in the evening when everyone is tired.
Number of Judges Needed
Although steps can be taken to reduce the effects of the factors listed above,
it is not possible to completely eliminate objectivist measurement errors
seen as differences among judges' ratings of the same performance. As
with the other facets of measurement, multiple observations (in this case
multiple judges) are necessary. The upper bound on reliability that can be
expected from a one-judge study is on the order of 0.5. 22 In Chapter 5 we
saw that van der Lei and colleagues obtained a reliability of 0.65 when using
eight judges in the study of Hypercritic. In the self-test below, we see that
three judges may be sufficient for some situations. There is, however, no
precise way to determine this number in advance. A measurement study is
necessary to verify that acceptable reliability is obtained for any particular
situation.
Improving Measurement Using Judges
The general approach is to increase the number of judges to improve reli-
ability or to improve the measurement process itself, affecting both relia-
bility and validity, by training the judges or designing better instruments for
them to use. Increasing the number of judges helps only if the added judges
perform equivalently to, the judges already included. If they do, the Spear-
man-Brown prophecy formula estimates how much improvement can be
obtained. What makes a human judge intrinsically “good,” aside from
having expert knowledge of the performance domain being assessed, is
rarely clear to the investigator.
To improve the quality of a measurement process employing judges, the
investigator can ensure that each judge observes a representative sample
of the phenomena of interest. A nested design can be helpful when there
is danger of asking each judge to do more observation than is reasonable.
The phenomena to be observed can be sectioned by time, or by other nat-
urally occurring criteria presented by the setting of the study. (For example,
Judge 1 observes on Monday the first week, Tuesday the second week, etc.)
Such a nested design also allows for a greater range of phenomena to form
the basis of the ratings, leading to greater generalizability of the results.
Laboratory studies, as opposed to field studies, give the investigator greater
control of the logistics of study, making it easier to implement these
approaches.
Search WWH ::




Custom Search