Environmental Engineering Reference
In-Depth Information
There is an important engineering concept related to the CDF. This concept is the percen-
tile. The probability of finding values smaller than the η-percentile = y η is η, that is,
y η
µ
Φ
=
η
(1.8)
σ
and hence
=+× ()
Φ 1
y η µσ η
(1.9)
Φ −1 (η) can be evaluated using EXCEL function NORMSINV(η) or MATLAB function
norminv(η). It is clear from Equation 1.9 that the percentile is simply the inverse CDF. The
concept of a percentile is widely used to define a characteristic value in design codes. For
example, Eurocode 7 (BS EN1997-1:2004) mentions in Section 2.4.5.2 on “Characteristic
values of geotechnical parameters” Clause (11) that “If statistical methods are used, the
characteristic value should be derived such that the calculated probability of a worse value
governing the occurrence of the limit state under consideration is not greater than 5%”
(p. 28). For the normal distribution, this 5% percentile or characteristic value will be given by
(
) =− ×= −
(
)
Φ 1
y η µσ
=+×
005
.
µ
1 64
.
σ
µ
1164
.
×
OV
(1.10)
Equation 1.10 i is not applicable for non-normal distribution.
It is possible to estimate the CDF of Y, given the data points of Y
n
1
(
) ()
() =
(
)
()
k
Fy
P
Y
y
Ι
Y
yFy
(1.11)
n
n
k
=
1
where Y ( k ) is the k th data point of Y; n is the total number of data; and I(.) is the indicator
function: it is unity if the enclosed statement is true and is zero otherwise. The CDF, F n ( y ),
estimated by Equation 1.11 from a finite sample size ( n ) is called the “empirical” CDF. The
analytical CDF shown in Equation 1.5 or 1.6 is called a “theoretical” CDF. There are sev-
eral estimators for the CDF. Equation 1.11 is called the Kaplan-Meier estimator. Another
rather common estimator is the median estimator
n
1
04
(
)
()
k
F
()
y
=
Ι
Y
y
03
.
(1.12)
n
n
+
.
k
=
1
Figure 1.7a shows two empirical CDFs (ECDFs) from two different simulated data sets of size
n = 10. The theoretical normal CDF with a mean of 100 kPa and a standard deviation of 20 kPa
is plotted as a solid line. It is clear that the ECDFs are different and are scattered around the
theoretical line. This is how statistical uncertainty manifests itself in a CDF plot. Figure 1.7b is
the same as Figure 1.7a , except that the sample size increases to 100. The scatter is now smaller.
It can be proved theoretically that statistical uncertainty decreases with the sample size.
The CDF is less visually appealing compared to the PDF because an engineer cannot
see the frequency of occurrences over a given range of value as readily as what is shown in
a histogram/PDF. Nonetheless, it is important to note that the ECDF can be constructed
 
 
Search WWH ::




Custom Search