Environmental Engineering Reference
In-Depth Information
TABLE 10.2. Cumulative Distribution Function of Concentration Measurements
c
c
c
rank
F
(
c
)
(mg/l)
rank
F
(
c
)
(mg/l)
rank
F
(
c
)
(mg/l)
1
0.980
15.49
11
0.649
3.29
21
0.318
1.98
2
0.947
14.30
12
0.616
3.26
22
0.285
1.83
3
0.914
14.30
13
0.583
3.14
23
0.252
1.69
4
0.881
11.57
14
0.550
2.89
24
0.219
1.63
5
0.848
6.41
15
0.517
2.72
25
0.185
1.34
6
0.815
4.91
16
0.483
2.67
26
0.152
1.17
7
0.781
4.88
17
0.450
2.52
27
0.119
1.10
8
0.748
4.71
18
0.417
2.52
28
0.086
1.02
9
0.715
4.28
19
0.384
2.48
29
0.053
0.78
10
0.682
4.00
20
0.351
2.20
30
0.020
0.39
The CDF for a log-normal distribution can be
expressed as
10.5.2 Comparisons of Probability Distributions
In addition to the visual comparison of the sample prob-
ability distribution with various theoretical probability
distributions, quantitative comparisons are also made
using hypothesis tests. The two most common hypoth-
esis tests for assessing whether an observed probability
distribution can be approximated by a given (theoreti-
cal) population distribution are the
chi-square test
and
the
Kolmogorov-Smirnov
(
KS) test
.
ln
c
−
µ
ln
c
−
µ
σ
1
2
1
2
y
y
F c
( )
=
Φ
or
F c
( )
=
+
erf
σ
2
2
y
y
where Φ(·) represents the CDF of the standard normal
deviate. Evaluation of the log-normal CDF can be easily
done using built-in statistical functions found in most
spreadsheet and statistical-analysis programs. The
sample values of
F
(
c
) are compared with the theoretical
values of
F
(
c
) (with
μ
y
= 1.20 and
σ
y
= 0.80) in Figure
10.7. Based on this comparison, it is apparent that the
sample values of
F
(
c
) are consistently greater than the
theoretical values of
F
(
c
) for
c
< 7 mg/l. Therefore,
based on this visual comparison, there does not seem to
be a close fit between the sample distribution and the
proposed log-normal population distribution. However,
a quantitative comparison using a statistical measure
should be performed prior to drawing any statistically
significant conclusion.
10.5.2.1 The Chi-Square Test.
Based on sampling
theory, it is known that if the
N
outcomes are divided
into
M
classes, with
X
m
being the number of outcomes
in class
m
, and
p
m
being the theoretical probability
of an outcome being in class
m
, then the random
variable,
M
∑
(
X
Np
Np
−
)
2
χ
2
m
m
=
(10.54)
m
m
=
1
has a chi-square distribution. The number of degrees of
freedom is
M
− 1 if the expected frequencies can be
computed without having to estimate the population
parameters from the sample statistics, while the number
of degrees of freedom is
M
− 1 −
n
if the expected fre-
quencies are computed by estimating
n
population
parameters from sample statistics. In applying the chi-
square goodness of fit test, the null hypothesis is that
the samples are drawn from the proposed population
probability distribution. The null hypothesis is accepted
at the
α
significance level if
χ
1.0
0.8
sample
distribution
0.6
lognormal distribution
(
m
y
= 1.2,
s
y
= 0.8)
0.4
0.2
2
∈[ ,
0
χ
α
2
]
and rejected
otherwise.
The effectiveness of the chi-square test is diminished
if both the number of data intervals, called cells, is less
than 5, and the expected number of outcomes in any cell
is less than 5 (Haldar and Mahadevan, 2000; McCuen,
2002a).
0.0
0
2
4
6
8
10
12
14
16
18
20
c
(mg/L)
Figure 10.7.
Comparison of sample and log-normal dis-
tribution.
Search WWH ::
Custom Search