Biomedical Engineering Reference
In-Depth Information
interval as long as 3 months, or they may be unwilling to carry out the
ratings twice. Lengthening the time interval between ratings increases
the risk that changes in the prevailing standard of care against which
TraumAID's advice is judged, or personal experiences of the judges that
alter their perceptions of what constitutes good care, might change the
context in which these judgments are made. Under these circumstances dis-
agreements between test and retest judgments could be attributed to
sources other than measurement error.
A set of somewhat differing observations taken at approximately the
same time, each purporting to measure the same attribute, may be called a
scale or an index. * Examples include sets of items on a written test or ques-
tionnaire, a set of human judges or observers, or a set of performance indi-
cators that occur naturally in the world. The Dow Jones index is computed
from the market values of a selected set of stocks, and is accepted as an
indicator of the more abstract attribute of the performance of the New York
Stock Exchange. Thinking of measurement problems in terms of multiple
observations, forming a scale or index to assess a single attribute, is useful
for several reasons. First, without these multiple observations we may have
no way of estimating the reliability of a measurement process, because the
test-retest approach is often impractical. Second, a one-observation mea-
surement rarely is sufficiently reliable or valid for use in objectivist studies.
(How can we possibly determine, based on one arrow, at what point an
archer was aiming? Including only one company in the Dow Jones index
could not possibly reflect the performance of the market as a whole.) Hence
multiple observations are usually necessary to produce a functioning instru-
ment. One shortcoming of the multiple observations approach is that the
observations we believe to be assessing a common attribute, and thus to
comprise a valid scale, may not behave as intended. (The archers who shoot
simultaneously may have different interpretations of where the bull's-eye
is.) To address this problem, there is a well-codified methodology for con-
structing scales, to be discussed in Chapter 6.
Whether we use the test-retest method or the internal consistency
approach with co-occurring observations, the best estimate of the true value
of the attribute for each object is the average of the independent observa-
tions. To compute the result of a measurement, we typically sum or average
the scores on the items comprising a scale or index. If we know the relia-
bility of a measurement process, we can then estimate the error due to
* Technically, there is a difference between a scale and an index, 6 but for purposes
of this discussion the terms can be used interchangeably. Also note that the term
scale has two uses in measurement. In addition to the definition given above, scale
can also refer to the set of response options from which one chooses when com-
pleting a rating form or questionnaire. In popular parlance, one might say “respond
on a scale of 1 to 10” of how satisfied you are with this information resource. We
often move freely, and without too much confusion, between these two uses of the
term scale.
Search WWH ::




Custom Search