Information Technology Reference

In-Depth Information

p
i
log
p
i

q
i

X

n

C
KL
.G
baseline
;a/
D

i
D
0

p
i
D
C
B
.
v
i
;G
baseline
/

q
i
D
C
B
v
i
;G
u
pdated

For nodes where p
i
D
0orq
i
D
0, we reset them as a small number 10
6
to avoid

log
(0).

8.1.4

Statistical Models

We constructed negative binomial (NB) and zero-inflated negative binomial (ZINB)

models to validate the role of structural variation in predicting future citation counts

of scientific publications. The negative binomial distribution is generated by a

sequence of independent Bernoulli trials. Each trial is either a 'success' with a

probability of
p
or a 'failure' with a probability of (1-
p
). Here the terminology

of success and failure in this context does not necessarily represent any practical

preferences. The random number of successes
X
before encountering a predefined

number of failures
r
has a negative binomial distribution:

X
NB .r; p/

One can adapt this definition to describe a wide variety of count events. Citation

counts belong to a type of count events with an over-dispersion, i.e. the variance is

greater than the mean. NB models are commonly used in the literature to study this

type of count events. Two types of dispersion parameters are used in the literature,

™ and ',where™•'
D
1.

Zero-inflated count models are commonly used to account for excessive zero

counts (Hilbe
2011
;Lambert
1992
). Zero-inflated models include two sources of

zero citations: the point mass at zero I
f
0
g
(y) and the count component with a count

distribution
f
count
(counts) such as negative binomial or Poisson (Zeileis et al.
2011
).

The probability of observing a zero count is inflated with probability
D
f
zero
(zero

citations).

f
zero
inflated
.citations/
D
I
f
0
g
.citations/
C
.1
/
f
count
.citations/

ZINB models are increasingly used in the literature to model excessive occur-

rences of zero citations (Fleming and Bromiley
2000
; Upham et al.
2010
). The report

of a ZINB model consists of two parts: the count model and the zero-inflated model.

One way to test whether a ZINB model is superior to a corresponding NB model is

known as the Vuong test. The Vuong test is designed to test the null hypothesis that

the two models are indistinguishable. Akaike's Information Criterion (AIC) is also

commonly used to evaluate the goodness of a model. Models with lower AIC scores

are regarded as better models.

Search WWH ::

Custom Search