Developing and Improving Measurement Methods - Evaluation Methods in Biomedical Informatics

Biomedical Engineering Reference

In-Depth Information

high-stakes standardized tests, where very high reliability is necessary to

make decisions about each individual's competence, more than 100 knowl-

edge questions (items) are routinely used within a knowledge domain. In

this situation, large numbers of items are required both to attain the high

reliability necessary to generate a small standard error of measurement and

to sample adequately a broad domain of knowledge. For ratings of perfor-

mance by expert judges, fewer items on a form may be necessary because

the attribute to be rated is often specific. For any particular measurement

situation, a measurement study can determine how many items are neces-

sary and which items should be deleted or modified to improve the per-

formance of the item set hypothesized to comprise a scale.

Improving Measurement with Items

We offer here several practical suggestions to minimize measurement errors

through attention to item design. We focus here on ratings and elicitations

of attitudes and beliefs because these applications arise frequently during

the evaluations that are the focus of this topic.

1. Make items specific. Perhaps the single most important way to improve

items is to make them as specific as possible. The more information the

respondents get from the item itself, about what exactly is being asked for

and what the response options mean, the greater is the consistency and thus

the reliability of the results. Consider a basic item that may be part of a

multi-item rating form (Figure 6.7A). As a first step toward specificity, the

item should offer a definition of the attribute to be rated, as shown in Figure

6.7B. The next step is to change the response categories from broad quali-

tative judgments to behavior or events that might be observed. As shown

in Figure 6.7C, we might change the logic of the responses by specifically

asking for the opinion as to how frequently the explanations were clear.

2. Match the logic of the response to that of the stem. This step is vitally

important. If the stem—the part of the item that elicits a response—

requests an estimate of a quantity, the response formats must offer a range

of reasonable quantities from which to choose. If the stem requests a

strength of belief, the response formats must offer an appropriate way to

express the strength of belief, such as the familiar “strongly agree” to

“strongly disagree” format.

3. Provide a range of semantically and logically distinct response options.

Be certain that the categories span the range of possible responses and do

not overlap. When response categories are given as quantitative ranges,

novice item developers often overlap the edges of the response ranges, as

in the following example.

Search WWH ::

Custom Search

Home