Information Technology Reference
In-Depth Information
discourage gaming so that subjects are encouraged to report their beliefs honestly, and a formula
for calculating calibration. Each of these requirements is discussed in the next sections.
Recording Beliefs and Decisions
Following the most common approach in calibration research for collecting data, subjects in this
study answered a series of multiple-choice questions (such as those shown earlier) by reporting
both their decisions and their confidence in the correctness of each decision. Each subject answered
a total of ten multiple-choice questions. Four questions were the same questions used in Bauer
and Johnson-Laird (1993) and the same truth table used by Bauer and Johnson-Laird (1993) was
used to generate all questions used in this study. For each of these ten questions, the subject
selected one alternative as the correct alternative, and then assigned a confidence value to that
alternative and other alternatives, as desired. Analysis of pilot study data showed that assigning
confidence values to multiple alternatives improved user calibration, a finding consistent with that
of Sniezek, Paese, and Switzer (1990).
Recording Scales
Confidence is typically recorded on a scale ranging from 0 to 1 or some subset. In this study, this
range was divided into increments of five-hundredths (i.e., 0.0, 0.05, 0.10, 0.15,..., 1.0) because
research suggests that this is consistent with the respondent's “natural scaling” of decision confi-
dence (Winkler, 1971). 5
Scoring Rules
The purpose of a scoring rule is to encourage respondents to report their confidence honestly in
each decision by eliciting values that reflect the respondent's actual belief in the quality of his or
her selection. For this to occur, a scoring rule must (1) be understood by the subject so that its
implications and the correspondence between beliefs and numerical values can be fully appreci-
ated, and (2) maximize the subject's expected total score only when the subject reports values that
correspond to his or her actual beliefs (Stael von Holstein, 1970). Assume that a subject's true
decision confidence is expressed by the probability vector P
(p 1 , p 2 , . . . , p n ) for a mutually
exclusive and collectively exhaustive set of events, {E 1 , E 2 , . . . , E n }. Assume further that the con-
fidence values an assessor reports are represented by R
(r 1 , r 2 , ... , r n ). A proper scoring rule
S exists if S is maximized only when r
p.
This requirement is satisfied only by a very few somewhat complex scoring rules that require
the respondent to perform high-level operations such as exponential, root, or log calculations
(Murphy and Winkler, 1970). These complex operations make it almost impossible for subjects to
quickly compute and fully appreciate the implications of their decisions and the correspondence
between their actual beliefs and the values they report. In other words, these scoring rules confuse
and may actually interfere with the subject's reporting values reflecting his or her actual beliefs.
A scoring rule that meets the criterion of understandability is the well-known simple linear scor-
ing rule S k (r)
r k , where k refers to the event that actually occurred and r k is the confidence prob-
ability assigned by the subject to the kth response. Unfortunately, in its simplest form, this scoring
rule is not strictly proper because S(r,p)
p k r k is maximized by setting one r i (i.e., the r i corre-
sponding to the largest p i ) equal to 1.0 and the other r i s equal to 0.0. If r i k , then the subject appears
to have complete confidence in the answer that turns out to be correct. On the other hand, if r i k ,
Search WWH ::




Custom Search