Environmental Engineering Reference
In-Depth Information
Table 1.10 Simulated bivariate X data with n 12 = 10, n 13 = 11, and n 23 = 9
Bivariate dataset #1
Bivariate dataset #2
Bivariate dataset #3
k
X 1
X 2
X 1
X 3
X 2
X 3
1
1.20
1.72
0.63
0.59
1.30
0.16
2
0.67
0.69
0.28
1.22
0.53
0.35
3
1.74
0.27
1.22
1.49
0.30
1.94
4
0.19
0.023
0.47
0.83
0.11
0.0038
5
1.68
0.81
2.08
0.020
1.58
1.87
6
1.28
0.17
0.0093
1.43
0.64
0.94
7
1.49
0.60
0.94
0.77
0.87
2.99
8
0.27
0.79
0.49
0.87
0.39
1.38
9
1.43
1.41
2.16
2.12
1.34
0.59
10
0.51
1.51
0.41
1.07
11
0.12
0.78
take values <0.677. This restriction is related to the concept of matrix positive definiteness.
The eigenspectrum of C contains only positive values if and only if C is positive-definite.
Namely, the C matrix in Equation 1.62 is not positive-definite. Indeed, it has a negative
eigenvalue of −0.2089. Positive definiteness can be guaranteed only if the correlation matrix
C is estimated from a full multivariate dataset (X 1 , X 2 , …, X d ) as shown in Table 1.9 (i.e.,
using Equation 1.57 ) and if n d . The C matrix estimated using the entry-by-entry bivari-
ate method in Equation 1.58 is not guaranteed to be positive-definite. Examples of produc-
ing nonpositive definite C based on actual data are shown in Section 1.7.3.
To illustrate the absurdity of the C matrix in Equation 1.62 , consider a random variable
Y = X 1 + X 2 - X 3 . It is common practice to encounter this linear sum, usually in the context
of a first-order Taylor series expansion of a nonlinear function. The variance of Y is equal
to
Var()= +++ −
σσσ δσσδ σσ δσσ
2
2
2
2
2
2
=+ −
3
2
δ
2
δ
2
2 δ=− .
0 4
1
2
3
12
12
13
13
23
23
12
13
(1.63)
where σ i = 1 is the standard deviation of Xi. i . Note that the variance of any random variable
is positive by definition. The nonpositive definite C matrix in Equation 1.62 can produce a
negative variance as shown in Equation 1.63 . Hence, positive definiteness is not an academic
concept that we can safely ignore in practice, notwithstanding the rather abstract nature of
this concept.
1.4.3.2 Goodness-of-fit test
Multivariate normality requires separate checks. For example, if the scatter plot of Xi i versus
X j shows a distinct nonlinear trend, then the multivariate normal distribution assumption
is not suitable. There are numerous formal tests for multivariate normality in the statis-
tics literature, but the state of practice is less established than formal tests for univariate
normality (e.g., K-S test). The first method is the generalization of the line test in Section
1.3.3. This method is applicable to nonstandard multivariate normal distribution with an
arbitrary dimension ( d ) and is based on the fact that the Mahalanobis distance Q d between
 
Search WWH ::




Custom Search