Simulating and Evaluating Biosurveillance Datasets - Biosurveillance: Methods and Case Studies

Biology Reference

In-Depth Information

2.4.2 Multivariate Tests

The preceding χ 2 tests can only uncover univariate disparities between the

original and mimicked data. To also consider the covariance between the

series, we consider multivariate goodness-of-fit tests. While it is not obvi-

ous that such a test can be performed in a distribution-free manner, sev-

eral methods have been developed to do so (notably, Bickel 1969; Friedman

and Raisky 1979; Schilling 1986; Kim and Foutz 1987; Henze 1988; Hall and

Tajvidi 2002).

In this chapter, we use the nearest-neighbors test described in Schilling

(1986), because of its asymptotic normality and computational tractabil-

ity. Under this test, the nearest k neighbors are computed for the combined

sample. Each of the nearest neighbors is then used to determine an indi-

cator variable, whether or not it shares the same class as the neighboring

point. The statistic T , the proportion of k -nearest neighbors sharing the same

class, is used to test equality of distributions. If both samples have the same

size and come from the same distribution, T will approach 0.5 as the sample

size increases. If the two samples differ in distribution, then T will tend to be

larger than 0.5. With an appropriate correction, T has an approximate stan-

dard normal distribution. For an example, see Figure 2.2.

14

300

10

250

14

32

200

13

150

12

100

50

35

0 0 0 0 0

100

120

Day

Binned Counts

Figure 2.1

A portion of a single time series being binned.

Search WWH ::

Custom Search

Home