Environmental Engineering Reference
In-Depth Information
outlier is identified by comparing D with D critical shown in Table 2.5 (i.e., reject the
data point suspected of an outlier if D
>
D critical ).
Table 2.5 Critical values for the Dixon's test *
Risk of false rejection
Critical value
n
0.5%
1%
5%
10%
3
0.994
0.988
0.941
0.886
4
0.926
0.899
0.765
0.679
D 10
5
0.821
0.780
0.642
0.557
6
0.740
0.698
0.560
0.482
7
0.680
0.637
0.507
0.434
8
0.727
0.683
0.554
0.479
D 11
9
0.677
0.635
0.512
0.441
10
0.639
0.597
0.477
0.409
11
0.713
0.679
0.576
0.517
D 21
12
0.675
0.642
0.546
0.490
13
0.649
0.615
0.521
0.467
*Reference: Robracher (1991)
The Dixon's test is not as efficient as the Rosner test (EPA recommended) for
detecting multiple outliers of large data sets. If multiple outliers are suspected, the
least extreme value should be tested first and then the test repeated. However, the
power of the test decreases as the number of repetitions increases.
2.2.5 Analysis of Censored Data
Suppose that the following mercury concentrations were obtained from several
representative samples of a drinking water supply: 2.5,
<
1.0, 1.9, 2.6
m
g/L. The
analyst reported the limit of quantitation is 1.0
g/L. What are the mean and standard
deviation? Does the quality of drinking water meet the regulatory standard of 2.0
m
m
g/
L (the maximum contaminant level allowable in drinking water)?
The non-numerical data such as ''not detected,'' ''less than'' in the above
example are the so-called censored data. The mean and standard deviation from
such measurements cannot be computed, hence a comparison with the legal standard
cannot be made. The presence of such censored data makes it difficult or impossible
to apply typical statistical analyses, specifically parametric tests typically used for
hypothesis testing, (i.e., comparisons of means, variances, or regression analyses).
Tests that can be applied will also have a decreased reliability as the amount of
censoring increases. Censoring can be a formidable problem in environmental
analysis, particularly for trace contaminants in waters where the amount of data
censoring can be as high as 80-95% of the data set.
How should we handle censoring data? First, deletion of censored data is
probably the worst procedure and should never be used because it causes a large and
variable bias in the parameter estimates (Helsel, 1990). After deletion, comparisons
Search WWH ::




Custom Search