Environmental Engineering Reference
In-Depth Information
A Type I error in environmental analysis is to falsely report that an area is
contaminated whereas in fact it is not (false positive), and a Type II error is to falsely
report that an area is clean whereas it is in fact contaminated (false negative).
Apparently, false positive is more conservative, but it will cost us unnecessary
resources to clean-up the site. On the contrary, false negative will create health and
environmental hazard/risk because of the failure in detecting the contaminant.
The next question is how to set up the hypothesis. An accepted convention in
hypothesis testing is to always write H 0 with an equality sign (H 0 ¼#). If the
hypothesis is directional, then the one-tailed test is appropriate (i.e., H a >
#).
Otherwise, use a two-tailed test: H a 6¼ #inwhichH a does not specify a departure
from H 0 in a particular direction.
#orH a <
2.2.4 Detection of Outliers
Outliers are observations that appear to be inconsistent with the remainder of the
collected data. A rule of thumb is that one should never just throw data away without
an explanation or reason. One should first examine the following cause of possible
outliers: (1) The outlier may be the true outlier because of mistakes such as sampling
error, analytical error (instrumental breakdowns, calibration problems), transcrip-
tion, keypunch, or data-coding error. (2) The outlier may be because of inherent
spatial/temporal variation of data or unexpected factors of practical importance such
as malfunctioning pollutant effluent controls, spills, plant shutdown, or hot spots. In
these cases, the suspected outlier is actually not the true outlier.
Possible remedies for suspected outliers are: (1) to replace the incorrect data by
re-doing the sampling and analysis. Correct the mistake and insert the correct value.
(2) To remove the outlier using a statistical test. However, no data should be
discarded solely on the basis of a statistical test. (3) To retain the outliers and use a
more robust statistical method that is not seriously affected by the presence of a few
outliers. Discussed below are several major statistical tests to identify outliers of
small data sets. For large data sets (n
25) or data that are not normally distributed,
the reader should consult other references for details.
>
Z -test
The mean and standard deviation of the entire data set are used to obtain the z-score
of each data point according to the following formula. The data point is rejected if
z >
3.
x
x
z ¼
ð2:24Þ
s
This test does not require any statistical table, but is not very reliable because both
mean and standard deviation themselves are affected by the outlier. Also the z-test
requires the normality of the data set.
Search WWH ::




Custom Search