Environmental Engineering Reference
In-Depth Information
Table 1.10
Simulated bivariate X data with n
12
=
10, n
13
=
11, and n
23
=
9
Bivariate dataset #1
Bivariate dataset #2
Bivariate dataset #3
k
X
1
X
2
X
1
X
3
X
2
X
3
1
1.20
1.72
−
0.63
−
0.59
−
1.30
−
0.16
2
0.67
−
0.69
0.28
1.22
−
0.53
−
0.35
3
1.74
0.27
−
1.22
−
1.49
0.30
−
1.94
4
0.19
0.023
−
0.47
−
0.83
−
0.11
−
0.0038
5
1.68
0.81
2.08
0.020
−
1.58
−
1.87
6
−
1.28
0.17
−
0.0093
−
1.43
0.64
−
0.94
7
1.49
0.60
−
0.94
−
0.77
−
0.87
−
2.99
8
0.27
0.79
0.49
0.87
−
0.39
−
1.38
9
−
1.43
1.41
−
2.16
−
2.12
1.34
0.59
10
−
0.51
1.51
0.41
−
1.07
11
0.12
−
0.78
take values <0.677. This restriction is related to the concept of matrix positive definiteness.
The eigenspectrum of
C
contains only positive values if and only if
C
is positive-definite.
Namely, the C matrix in
Equation 1.62
is not positive-definite. Indeed, it has a negative
eigenvalue of −0.2089. Positive definiteness can be guaranteed only if the correlation matrix
using
Equation 1.57
) and if
n
≫
d
. The
C
matrix estimated using the entry-by-entry bivari-
ate method in
Equation 1.58
is not guaranteed to be positive-definite. Examples of produc-
ing nonpositive definite
C
based on actual data are shown in Section 1.7.3.
To illustrate the absurdity of the
C
matrix in
Equation 1.62
, consider a random variable
Y = X
1
+ X
2
- X
3
. It is common practice to encounter this linear sum, usually in the context
of a first-order Taylor series expansion of a nonlinear function. The variance of Y is equal
to
Var()= +++ −
σσσ δσσδ σσ δσσ
2
2
2
2
2
−
2
=+ −
3
2
δ
2
δ
−
2
2
δ=− .
0 4
1
2
3
12
12
13
13
23
23
12
13
(1.63)
where σ
i
= 1 is the standard deviation of Xi.
i
. Note that the variance of any random variable
is positive by definition. The nonpositive definite
C
matrix in
Equation 1.62
can produce a
negative variance as shown in
Equation 1.63
.
Hence, positive definiteness is not an academic
concept that we can safely ignore in practice, notwithstanding the rather abstract nature of
this concept.
1.4.3.2 Goodness-of-fit test
Multivariate normality requires separate checks. For example, if the scatter plot of Xi
i
versus
X
j
shows a distinct nonlinear trend, then the multivariate normal distribution assumption
is not suitable. There are numerous formal tests for multivariate normality in the statis-
tics literature, but the state of practice is less established than formal tests for univariate
normality (e.g., K-S test). The first method is the generalization of the line test in Section
1.3.3. This method is applicable to nonstandard multivariate normal distribution with an
arbitrary dimension (
d
) and is based on the fact that the Mahalanobis distance Q
d
between
Search WWH ::
Custom Search