Biology Reference
In-Depth Information
[2]. In particular, since
N
ab
≤
min
{
N
a
,N
b
}
, it follows that
R
ab
is bounded above
by:
N
·
min
{
N
a
,N
b
}
N
R
ab
≤
=
.
(15.4)
min
{
N
a
,N
b
}·
max
{
N
a
,N
b
}
max
{
N
a
,N
b
}
Further, note that this upper bound is achieved whenever
N
ab
is equal to its max-
imum possible value, min
. Thus, in cases where both
N
a
and
N
b
are
small,
R
ab
can become quite large. This situation commonly arises in practice
when an unusual drug name (e.g., a misspelling) occurs only once in the database
(implying
N
ab
=
N
a
=1), in a record that also lists an unusual outcome (im-
plying
N
b
<< N
). As a specific example, one “drug” listed in the portion of the
AERS database considered here is “unspecified weed killer,” which appears only
once, in a record that lists the rare adverse event “murder.” Since this adverse
event appears only
N
b
= 147 times in
N
= 462
,
936 records, this combination
has the huge reporting ratio value
R
ab
=
N/N
b
{
N
a
,N
b
}
3149.
Several approaches have been proposed to overcome this difficulty. One is
the Bayesian shrinkage estimator of DuMouchel mentioned earlier [2]. Another
is the use of the proportional reporting ratio
P
ab
with an associated
χ
2
signifi-
cance measure and minimum
N
ab
limits to down-weight small samples [4, 9]. A
different approach is taken here [10], based on the reporting ratio
R
ab
and the
sta-
tistical unexpectedness
U
ab
,defined as follows. Model the adverse event dataset
as an urn of
N
balls, with the
N
a
balls corresponding to records that list Drug
A colored black and the others colored white. In the absence of any association
between Drug A and Adverse Event B, the records listing Adverse Event B may
be viewed as a random sample of
N
b
balls drawn from the urn, of which
N
ab
are black. It is a standard result that this number should follow the hypergeomet-
ric probability distribution [11, Ch. 6]. If
R
ab
>
1,then
N
ab
is larger than the
value expected under this independence assumption, and the significance of this
difference can be assessed by computing the probability of observing a value as
large as or larger than
N
ab
from the hypergeometric distribution defined by
N
,
N
a
and
N
b
.Conversely,if
R
ab
<
1,then
N
ab
is smaller than expected and the
significance of this difference can be assessed by computing the probability of
observing a value as small as or smaller than
N
ab
. The corresponding probability
ρ
ab
is easily computed via standard routines for the cumulative hypergeometric
distribution, and the
statistical unexpectedness
is defined as the reciprocal of this
probability,
u
ab
=1
/ρ
ab
.
The reason for using
u
ab
instead of
ρ
ab
is that large
u
ab
values merit our
attention, which is advantageous in the graphical display used here. Also, since
the
u
ab
values span a wide numerical range, it is convenient to work instead with
Search WWH ::
Custom Search