Aberration Detection in R Illustrated by Danish Mortality Monitoring - Biosurveillance: Methods and Case Studies

Biology Reference

In-Depth Information

12.5.1 run-length Properties

As mentioned previously, the behavior of the CUSUM depends very much

on the choice of the threshold g . In order to guide the choice of g , we will

look at the run-length distribution of T A |τ = ∞ under the fitted negative bino-

mial model. Prediction of μ 0, t requires knowledge of all involved covariates

during the monitoring period, e.g., in model (6), this would be the popula-

tion size. For the monitored period of 65 weeks (2007-W40 - 2008-W52), these

values are available, but if monitoring exceeded this period, we would have

needed to predict covariate values as well before being able to compute μ 0, t .

Hence, it is practically more feasible to look at P ( T A ≤ t A | τ = ∞) for a small t A

than to estimate, for example, E ( T A |τ = ∞) as here many more time points

might be needed if the expectation is large. Furthermore, the distribution of

T A is also often skewed, which makes the expectation a bad summary of the

central tendency.

Specifically, we want to choose g such that P ( T A ≤ 6 5|τ = ∞) is below some

acceptable value, for example, 10%. In other words, the probability of a false

alarm within the 65 weeks of our μ 0, t versus μ 1, t monitoring should be below

10%. To compute the probability under the selected model, two approaches

exist: direct Monte Carlo estimation or a Markov chain approximation.

In the first approach, we use Monte Carlo estimation of P ( T A ≤ 6 5|τ = ∞). For

each realization j , a time series of length 65 is simulated from the estimated

negative binomial model with mean μ 0, t , t =1, … , 65, and dispersion param-

eter α. Then the negative binomial CUSUM is applied to this time series and

one checks if T A j ≤65 . The probability of interest using k such realizations

can then be estimated as ∑

j

1 6( ) , where I (·) is the indicator func-

tion. Code-wise this can be done for k = 1000 for a grid of g 's as follows.

IT

≤ /

k

=

A

R> simone.TAleq65 <- function(sts, g) {

observed(sts)[phase2,]<-rnbinom(length(mu0), mu = mu0, size =

mtheta)

one <-glrnb(sts, control = modifyList(control(s.nb),

list(c.ARL = g)))

return(any(alarms(one)))

}

R> g.grid <-seq(1.8, by = 0.5)

R> pMC <-sapply(g.grid, function(g) {

mean(replicate(1000,simone)))

})

Figure 12.7 shows the result. We note that g ≈ 4.75 ensures that the false-

alarm probability within the monitoring period drops below the desired

level of 10%. If one is interested in P ( T A ≤ 6 5|τ = 1) instead, μ 1, t has to be used

as argument mu in rnbinom .

A different option to compute this false-alarm probability for a likeli-

hood ratio-based CUSUM is to use a Markov chain approximation to deter-

mine the PMF of the run-length variable. This approach implemented in

Biosurveillance: Methods and Case Studies

Search WWH ::

Custom Search

Home