Information Technology Reference
In-Depth Information
the subject appears totally wrong, but loses nothing because the scoring rule imposes no penalty for
being wrong. In other words, a subject maximizes his or her score by assigning a confidence of 1.0
to one answer, despite his or her true belief in the quality of any answer.
Despite this limitation, most calibration research has used some variation of this simple linear
scoring rule. In fact, comparing three complex proper scoring rules to the simple linear scoring
rule, Rippey (1970) reported that the simple linear scoring rule actually produced more reliable
results. Likewise, reviewing a number of these studies, Phillips (1970) concluded that the com-
plex proper scoring rules did not yield significantly different values than those collected using a
simple linear scoring rule, but, as expected, subjects found simple linear scoring rules more real-
istic and easier to understand.
Considering these tradeoffs, this study used a variant of the simple linear scoring rule that dis-
couraged gaming and guessing by penalizing wrong answers. The scoring rule used here was:
Sr
[(
l r t r
)/
2
]
k
i
k
where S is the score, k refers to the correct alternative, r k is the confidence probability assigned to
that alternative, and r i k are the confidence probabilities assigned to the alternatives that turn out
to be incorrect. This variant of the simple scoring rule is easily understood because its implica-
tions can be more readily appreciated and the respondent can better understand the correspon-
dence between her beliefs and the numerical values she reports. Yet, subjects are encouraged to
report numerical values that correspond to their actual beliefs because of the penalty of one-half
the largest confidence value assigned to an incorrect alternative choice. This results in a scoring
rule that, for a four-alternative multiple-choice question, has a random guess expected value of
.125 and a scoring range from 1.0 to
0.5.
Computing Calibration
The dependent variable for this study is calibration, the correspondence between an individual's
confidence in a decision and the actual quality of the decision. The most popular calculation for
calibration is:
1
N
T
calibration
n(r
c)
2
t
t
t
t
1
where N is the total number of responses, n t is the number of times the confidence value r t is used,
c t is the proportion correct for all items assigned confidence value r t , and T is the total number of
different response categories used (Clemen and Murphy, 1990). Using this formula, perfect cali-
bration is a score of 0.0. The worst possible score, 1.0, can only be obtained when the responses
are completely and consistently wrong, that is, r t
1.0 is always assigned to the wrong answer
and r t
0.0 is always assigned to the answer that turns out to be correct.
Data Analysis
Analysis of this data shows that the content of the problem, electrical circuit or people and places,
had no effect on either percentage correct (questions 1-4, F (1,36)
.08, p
0.7 and questions
5-10, F (1,36)
.01, p
0.9) or user calibration (questions 1-4, F (1,36)
.07, p
0.7 and questions
Search WWH ::




Custom Search