Biomedical Engineering Reference
In-Depth Information
subsequently select cases to ensure balanced representation of these fea-
tures. With this approach the investigator can control the amount of vari-
ability in the case set, but must do so with recognition that it invokes the
trade-offs introduced earlier. Constraining the variability too much (making
the cases too similar) leads to high reliability of measurement but does not
allow generalization beyond that homogeneous set. On the other hand, pur-
poseful building of a highly diverse case set inevitably requires a larger
number of cases for reliable measurement.
Although building a case set from such a blueprint gives the investigator
a great deal of control, it generates a sample of cases that is contrived and
may be difficult to describe. With the second strategy, the investigator
selects cases based on natural occurrence, for example using consecutive
admissions to a hospital, or consecutive calls to the help desk, as the crite-
rion. The resulting set of cases has a clear reference population, but the vari-
ability in the case mix is not under the investigator's control. In a study of
a clinical decision support or biosurveillance system, for example, cases that
invoke the capabilities of this resource may not appear with sufficient fre-
quency in a naturally occurring sequence of cases.
Whichever strategy is followed, the key to this process is to have a defen-
sible selection plan that follows from the purposes of the study and in turn
allows the investigator to identify the population from which the cases were
selected. The implications of these strategies for demonstration study design
are discussed in Chapter 7.
Scoring
The execution of many tasks generates a result that can be scored by
formula or algorithm, with no human judgment required to generate a score
after the formula itself is established. This is often the case when the task
has a generally acknowledged reference standard or the problem has an
unambiguous correct answer. For example, the accuracy of a resource
performing protein structure prediction can be computed as the mean
displacement of the atoms' predicted location from their known actual loca-
tions as established experimentally. A task in clinical diagnosis may be
scored in relation to the location of the correct diagnosis on the hypothesis
list provided by the clinician, assuming that the correct diagnosis is known
with a high degree of certainty. In other circumstances, where there is no
reference standard or correct answer, the task does not lend itself to for-
mulaic scoring, and in these circumstances human judges must be employed
to render an opinion or a verdict that becomes the performance score. This
almost always generates a two-facet measurement problem that includes
both tasks and judges.
Even when tasks can be scored formulaically, the development of scoring
methods may not be straightforward and merits care. For example, the
apparently simple assignment of a score to a clinician's diagnostic hypo-
Search WWH ::




Custom Search