Biology Reference
In-Depth Information
2.4.2 Multivariate Tests
The preceding χ
2
tests can only uncover univariate disparities between the
original and mimicked data. To also consider the covariance between the
series, we consider multivariate goodness-of-fit tests. While it is not obvi-
ous that such a test can be performed in a distribution-free manner, sev-
eral methods have been developed to do so (notably, Bickel 1969; Friedman
and Raisky 1979; Schilling 1986; Kim and Foutz 1987; Henze 1988; Hall and
Tajvidi 2002).
In this chapter, we use the nearest-neighbors test described in Schilling
(1986), because of its asymptotic normality and computational tractabil-
ity. Under this test, the nearest
k
neighbors are computed for the combined
sample. Each of the nearest neighbors is then used to determine an indi-
cator variable, whether or not it shares the same class as the neighboring
point. The statistic
T
, the proportion of
k
-nearest neighbors sharing the same
class, is used to test equality of distributions. If both samples have the same
size and come from the same distribution,
T
will approach 0.5 as the sample
size increases. If the two samples differ in distribution, then
T
will tend to be
larger than 0.5. With an appropriate correction,
T
has an approximate stan-
dard normal distribution. For an example, see Figure 2.2.
14
300
10
250
14
32
200
13
150
12
100
50
35
0 0 0 0 0
100
120
Day
Binned Counts
Figure 2.1
A portion of a single time series being binned.