Biology Reference
In-Depth Information
from Statistics Denmark (2009) on the number of individuals on January 1,
1994-2008, in each of the eight age groups.
R> population(momo) <- as.matrix(read.csv("population-dk.csv",
check.names = FALSE))
We will in the following use a generalized log-link negative binomial model for
the in-control situation of a specific age group, that is, y t ~ NegBin( μ 0,t , α) with
log(
µββ
, =+⋅+ +⋅ ,
)
tct
( )
β
pop
0
t
0
1
2
t
(12.6)
where c ( t ) is a cyclic function with period 52 or 53, depending on the number
of ISO weeks in the year of t , e.g., c (0) = c (52) for years with 52 ISO weeks.
Such behavior can, e.g., be obtained by sinusoidals (Serfling, 1963) or using
cyclic splines (Wood, 2006). Furthermore, pop t denotes the population size
in the respective age group at time t . In the above negative binomial model
E ( yt ) = μ 0, t and Var(
2 , that is, α is a dispersion parameter,
which we will assume to be constant over time. Thus, with α> 0, we are able
to handle possible overdispersion of the count data time series. For α→0, the
negative binomial distribution tends to the Poisson distribution.
The out-of-control model for the mean is now assumed to be μ 1, t = κ· μ 0, t
which on the log-link scale corresponds to a level shift in the intercept from
β 0 to β 0 +log(κ). The following R code estimates such a negative binomial
GLM from the phase 1 data of the 75-84 age group using the glm.nb func-
tion (Venables and Ripley, 2002).
y t
)
=+⋅
,
µαµ
0
t
0
,
t
R> phase1 <- which(year(momo) == 2002 epochInYear(momo) ==
40):(phase2[1] - 1)
R> momo.df <- as.data.frame(momo)
R> m <- glm.nb('observed.[75,85)' 1 + epoch + sin(2 * pi *
epochInPeriod) + cos(2 * pi * epochInPeriod) + 'population.
[75,85)', data = momo.df[phase1,])
R> mu0 <- predict(m, newdata = momo.df[phase2, ], type =
"response")
Here, phase1 contains the index of all time epochs in the phase 1 sample
used to estimate the in-control parameters. A 5-year period has been used
above. Then the function as.data.frame is applied to convert the sts object
to the necessary data.frame used by glm.nb . For simplicity, a single har-
monic is used for c(t) consisting of one sine and one cosine term. The parame-
ter estimates for the other terms are
ˆ β =. ,
ˆ β =−. ⋅ ,
ˆ β =−. ⋅
10 49
95410
12410
5
5
0
1
2
and ˆ α= . ⋅
19710 3 . In practical application, one should perform a model selec-
tion process to decide on covariates and an appropriate number of harmon-
ics to include. For example, such a selection for the above model would reveal
pop t as being nonsignificant, whereas a total of three superimposed harmon-
ics could be justified. For illustration we, however, proceed with the above
Search WWH ::




Custom Search