Geoscience Reference
In-Depth Information
influenced at all by the 25% data on either tail. It is, therefore, the width of the
non-zero weight window for the trimmed mean of Fig. 2.3.
The IQR is computed by subtracting the 25 th percentile values from the
75 th percentile value. The 75 th (upper quartile), 50 th (median) and 25 th (lower
quartile) percentiles split the entire time series data into four equal-sized
quarters. These three quartiles help in depicting graphical distribution of time
series data in the form of box and whisker plot (see Section 3.1.3 of Chapter
3 for details). The 75 th percentile ( P 75 ), which is also called the 'upper quartile',
is a value which exceeds no more than 75% of the data and is exceeded by no
more than 25% of the data in a time series. The 25 th percentile ( P 25 ) or 'lower
quartile' is a value which exceeds no more than 25% of the data and is
exceeded by no more than 75% of the data in a time series. Consider a time
series arranged in chronological order of magnitudes of data: x i , i = 1 to n . The
percentiles ( P j ) are computed using the following formula (Helsel and Hirsch,
2002):
x
P j =
(11)
n
1
j
where n is the sample size of x i , and j is the fraction of data less than or equal
to the percentile value (for the 25 th , 50 th and 75 th percentiles, j = 0.25, 0.50 and
0.75, respectively).
The range is the length of the smallest interval which contains all the data
of time series. It is calculated by taking difference between the maximum and
minimum values of the time series. Since it only depends on two of the
observations, it is a poor and weak measure of dispersion except when the
sample size is large.
The coefficient of variation (CV) gives a normalized measure of spreading
about the mean, and is estimated as:
s
x –
100
CV(%) =
(12)
The standard deviation of data series must always be understood in the
context of the mean of the data series. Thus, the CV being a dimensionless
number is advantageous over the standard deviation. Therefore, when
comparing between datasets with different units or widely different means,
one should use the coefficient of variation instead of the standard deviation.
On the contrary, consideration of the CV also has limitations in certain cases.
For example, when the mean of the data series is close to zero, the CV value
will approach infinity and hence it is sensitive to small changes in the mean.
Also, unlike the standard deviation, it cannot be used to construct confidence
intervals for the mean.
Hydrologic variables with larger CV values are more variable than those
with smaller values. Wilding (1985) suggested a classification scheme for
identifying the extent of variability for soil properties based on their CV
Search WWH ::




Custom Search