Environmental Engineering Reference
In-Depth Information
The correlation coefficient should only be calculated
when the relationship between two random variables is
thought to be linear. When the relationship is nonlinear,
data transforms can sometimes be used to make the
relationship more linear. It must always be kept in mind
that a high correlation between two variables does not
imply a cause-and-effect relationship. In addition,
although independent variables are generally uncorre-
lated, uncorrelated variables are not always indepen-
dent. For example, two variables might be closely related
by a nonlinear function, and a calculation of the correla-
tion coefficient might yield r xy = 0.
N
(
)
(
)
x
x y
y
i
i
i
=
1
r
=
xy
1 2
/
1 2
/
N
N
2
2
(
)
(
)
x
x
y
y
i
i
i
=
1
i
=
1
127 4
.
=
[
]
1 2
/
[
]
1 2
/
892 6
.
184 0
.
=
0 314
.
Hence, the estimated correlation coefficient is 0.314.
The 95% confidence limits for r xy are calculated using
Equation (10.122), where t * = t 0.025 with N − 2 = 20
degrees of freedom, and Appendix C.2 gives as
t 0.025 = 2.086. Substituting into Equation (10.122) gives
EXAMPLE 10.24
Several simultaneous measurements of two water-
quality variables, X and Y , are as follows:
N
1
t
* =
r
xy
2
1
r
x
y
x
y
x
y
x
y
xy
11.78
57.25
17.73
59.71
23.97
58.48
29.24
64.54
22 1
1
2 086
.
=
r
12.07
55.89
18.90
60.59
24.32
57.36
30.77
64.19
xy
r
2
13.62
58.26
19.72
52.89
25.59
59.94
31.78
59.86
xy
14.62
58.79
20.12
55.20
26.31
55.29
32.60
57.17
15.25
55.63
21.93
60.30
27.79
54.44
-
-
which yields r xy = ±0.414. Since the calculated value of
r xy (= 0.314) is within this range, then the correlation
coefficient is not significantly different from zero at the
95% confidence level.
16.40
55.55
22.43
55.41
28.30
56.66
-
-
Determine the correlation coefficient between X and
Y , and assess whether there is significant correlation at
the 5% significance level.
The correlation coefficient is sometimes used to assess
if there is a (linear) relationship between values of the
same variable measured at different times. The sequence
of such variables is called a time series , and the calcu-
lated correlations are called serial correlations or auto-
correlations . Serial and autocorrelations are typically
calculated at different time lags between measurements.
At time lags where the correlation coefficient is not
significantly different from zero, sample measurements
are independent.
Solution
From the given data, N = 22, and the means of x and y
are calculated as follows:
N
1
x
=
x i
=
22 06
.
N
i
=
1
N
1
y
=
y i
=
57 88
.
N
i
=
1
10.10.2 Regression Analysis
using these means, the variance terms are:
The objective of regression analysis is to determine an
equation and associated parameter values that ade-
quately describe the relationship between two or more
variables. A common approach to regression analysis is
to first select the functional form of the equation to be
matched to the data, and then adjust the parameters of
the equation until the sum of the squares of the devia-
tions of the data from the assumed function is mini-
mized. A limitation of this approach is obviously that
there are an infinite number of possible functions that
might it the data, and it is generally not possible to
determine which is best. An alternative approach is to
N
1
(
)
(
) =
x
x y
y
127 .
i
i
i
=
N
(
) =
2
x
x
892 .
i
i
=
1
N
(
) =
2
y
y
184 .
i
i
=
1
and r xy is calculated using Equation (10.121) as follows,
 
Search WWH ::




Custom Search