Information Technology Reference
In-Depth Information
p i log p i
q i
C KL .G baseline ;a/ D
i D 0
p i D C B . v i ;G baseline /
q i D C B v i ;G u pdated
For nodes where p i D 0orq i D 0, we reset them as a small number 10 6 to avoid
log (0).
Statistical Models
We constructed negative binomial (NB) and zero-inflated negative binomial (ZINB)
models to validate the role of structural variation in predicting future citation counts
of scientific publications. The negative binomial distribution is generated by a
sequence of independent Bernoulli trials. Each trial is either a 'success' with a
probability of p or a 'failure' with a probability of (1- p ). Here the terminology
of success and failure in this context does not necessarily represent any practical
preferences. The random number of successes X before encountering a predefined
number of failures r has a negative binomial distribution:
X NB .r; p/
One can adapt this definition to describe a wide variety of count events. Citation
counts belong to a type of count events with an over-dispersion, i.e. the variance is
greater than the mean. NB models are commonly used in the literature to study this
type of count events. Two types of dispersion parameters are used in the literature,
™ and ',where™•' D 1.
Zero-inflated count models are commonly used to account for excessive zero
counts (Hilbe 2011 ;Lambert 1992 ). Zero-inflated models include two sources of
zero citations: the point mass at zero I f 0 g (y) and the count component with a count
distribution f count (counts) such as negative binomial or Poisson (Zeileis et al. 2011 ).
The probability of observing a zero count is inflated with probability   D f zero (zero
f zero inflated .citations/ D I f 0 g .citations/ C .1 / f count .citations/
ZINB models are increasingly used in the literature to model excessive occur-
rences of zero citations (Fleming and Bromiley 2000 ; Upham et al. 2010 ). The report
of a ZINB model consists of two parts: the count model and the zero-inflated model.
One way to test whether a ZINB model is superior to a corresponding NB model is
known as the Vuong test. The Vuong test is designed to test the null hypothesis that
the two models are indistinguishable. Akaike's Information Criterion (AIC) is also
commonly used to evaluate the goodness of a model. Models with lower AIC scores
are regarded as better models.
Search WWH ::

Custom Search