Geoscience Reference
In-Depth Information
1
x
(
x
x
)
or
¯ =
(4)
j
j
j
n
where ¯ (j) is the mean of all data values excluding x (j) . Each observation's
influence on the overall mean ¯ is the distance between the observation and
the mean excluding that observation. Hence, all observations do not have the
same influence on the mean. An extreme/outlier observation, either high or
low, has a much greater influence on the overall mean ¯ than does a more
'typical' observation, one closer to its ¯ (j) .
The influence of extreme or outlier can also be illustrated by realizing that
the mean is the balance point of the entire data values, when each point is
arranged on a number line (Fig. 2.1). Data points far from the central location
apply a stronger downward force than those closer to the centre. If one data
point nearby the central location on number line is removed, the balance point
would only need a little adjustment to keep the whole dataset in balance. On
the contrary, if one outlier value very far from central location is removed, the
balance point would shift considerably (Fig. 2.2). This sensitivity to the
magnitudes of a small number of points in the dataset defines why the mean
is not a robust/resistant measure of location. It is not resistant to changes in
the presence of, or to changes in the magnitudes of, a few outlier observations.
When this strong influence of a small number of observations in a dataset is
desirable, the mean is an appropriate measure of central location. This usually
occurs when computing units of mass, such as the average precipitation from
a number of sites in a raingauge network. High rainfall amounts represented
by a raingauge would exert more influence (due to greater mass of rainfall) on
the final average rainfall amount than low rainfall amounts.
Fig. 2.1. Mean shown by triangle acting as a balance point of time
series data (Helsel and Hirsch, 2002).
Fig. 2.2. Shift of mean in the left direction after removal of outlier.
2.1.2 Robust Measure: Median
The median is the middle value of data series when the data are ranked in their
order of magnitude. It is 50 th percentile ( P 50 ) of the dataset. For a data series
 
Search WWH ::




Custom Search