Environmental Engineering Reference
In-Depth Information
having no mode. The main use of mode is limited to a quick measure of the central
value since little or no computation is needed.
Data variation or dispersion is a characteristic of how spread out the data points
are from each other. Three common methods of describing the dispersion are
variance, standard deviation, and range. The range of a set of data is the difference
between the maximum and minimum values. Like mean, the range is influenced by
extreme (low probability) observations, and its use as a measure of variation is
limited. The population variance, denoted as s 2 , is defined as:
2
s 2
¼ 1
=
N
ðx i
ð2
:
16Þ
Sample variance ðs 2
Þ is the square of the standard deviation s (Eq. 2.7). It is
calculated:
2
s 2
¼ 1
=ðn1Þðx i
ð2
:
17Þ
In Eqs 2.16 and 2.17, N is the total population size and n the sample size. The term
ðx i
xÞ in Eq. 2.17 describes the difference between each value of a sample and the
mean of that sample. This difference is then squared so that the differences of points
above and below the mean do not cancel out each other. The sample variance, s 2 ,is
an unbiased estimator of population variance, s 2 , meaning that the values of s 2 tend
to target in on the true value of s 2 in a population. This concept is important for
inferential statistics. However, the unit of variance (units 2 ) is different from the
original data set and must be used with care. Standard deviation is the square root of
variance, that is, s ¼ðs 2
1=2 for a sample.
Variance is an important quantity, particularly, because variances are additive
and the overall variance for a process may be estimated by summing the individual
variances for its consistent parts, as expressed in equations. For example:
1=2
for a population and s ¼ðs 2
Þ
Þ
s 2
ðoverallÞ¼s 2
ðfield samplingÞþs 2
ðlab analysisÞ
This is an important relationship when considering the sources of the overall
variability in a sampling and analytical process.
One should note that manual calculation of the above descriptive statistics using
calculator are seldom a practice nowadays, for large data sets. Readers should get
familiar with the Descriptive Statistics using Excel spreadsheet. This can be done by
first selecting ToolsjData Analysis and then selecting Descriptive Statistics from the
Analysis Tool list and click OK.
2.2.2 Understanding Probability Distributions
Normal (Gaussian) Distribution
This is a symmetrical and bell-shaped distribution of a given data set. Many of the
environmental data sets are generally skewed, and a normal distribution is acquired
after log-transformation of the original data. These data sets are termed as
Search WWH ::




Custom Search