Biomedical Engineering Reference
In-Depth Information
to generalize from that large number of cases without having to expose
every object to every case. In measurement studies, generalizability theory
(see Appendix A) provides a way to estimate sources of measurement
errors for nested designs.
Judges Facet
The judges facet enters into a measurement problem whenever informed
human judges assess specific aspects of the quality of an activity or a
product they are observing. Judges become central to measurement in
informatics for situations where there are no reference standards or correct
answers for the attribute(s) under study. In these situations, the considered
opinions of human experts are the best option to generate a measured
score. A study might employ experts to judge the quality of the interactions
between patients and clinicians, as the clinicians enter patient data into an
information resource during the interaction. In another example, observers
may assess key aspects of the interaction of end users with a new informa-
tion resource during a beta test. As with any measurement process, the
primary concern is the correlation among the independent observations—
in this situation, the judges—and the resulting number of judges required
to obtain a reliable measurement. A set of “well-behaved” judges, all of
whom correlate with one another to an acceptable extent when rating a
representative sample of objects, can be said to form a scale. A large
literature on performance assessment by judges speaks in more detail to
many of the issues addressed here. 18-20
Sources of Variation Among Judges
Ideally, all judges of the same object, using the same criteria and forms to
record their opinions, should render highly correlated judgments. All vari-
ation should then be among objects. Many factors that erode interjudge
agreement are well known and have been well documented 21 :
1. Interpretation or logical effects: Judges may differ in their interpreta-
tions of the attribute(s) to be rated and the meanings of the items on the
forms on which they record their judgments. They may give similar ratings
to attributes that are logically related in their own minds.
2. Judge tendency effects: Some judges are consistently overgenerous or
lenient; others are consistently hypercritical or stringent. Others do not
employ the full set of response options on a form, locating all of their ratings
in a narrow region, which is usually at the middle of the range. This phe-
nomenon is known as a “central tendency” effect.
3. Insufficient exposure: Sometimes the logistics of a study require that
judges base their judgments on less exposure to the objects than is neces-
sary to come to an informed conclusion. This may occur, for example, if
investigators schedule 10 minutes of observation of end users working with
Search WWH ::




Custom Search