Information Technology Reference
In-Depth Information
is very small, in our case we needed a least 6 measures. Intensive simulations
showed that 9 or more measures give a good accuracy, even 5 are not bad. So
we need at least 15 measures to start a reliable outlier detection. This number
looked to high to us so we studied the possibility for including seed points in KF
procedure.
Reusing the seed carries the problem of statistical dependence, thus we con-
sidered to obtain the initial estimations from the seed but using a different re-
gression framework other than least squares and, then, to test experimentally the
effects. For initial estimation the Repeated Median [15] was chosen, in median
based regression the target is to minimize the sum of absolute residuals instead
of the squares. The experiments shown a small increment in final estimation
errors when including the seed but also a significant increment in the ability for
early outliers detection. So we propose to include seeds in KF procedure using
median based regression for initial estimations.
Before starting KF estimation procedure a condition for segment end detection
has to be proposed. Two types of outliers exist: proper outliers, points belonging
to the segment but a bit far from it, and improper outliers, points belonging to
other segments. Finding only one outlier as segment end detection is not enough,
but waiting for two consecutive outliers is too restricitive and can lead to failing
to detect many connected segments ends when noise variance is high. As an
intermediate solution our proposed end detection condition is: two outliers in
two or three consecutive measures, so that if α is the significance level for outlier
detection, then,
α 2 is, approximately, the significance level for end detection.
In order to stop estimation, segments ends have to be found at both seed sides.
We had to impose two more conditions for estimation stop. First, if process has
reached the beginning of the previous seed or the end of the next one, estimation
is stopped at the corresponding seed side. These stop points are set in order to
avoid excessive overlapping between connected segments as result of high noise
levels. Second, if the estimated noise standard deviation is higher than a limit,
the filter is considered as out of control, the limit was set to
2
0
.
1
,equivalentto
1
m.
When all KF estimations are concluded, a search for groups of points not
belonging to any segment, holes, is performed. One new segment is estimated
per each hole. At this point, overlaps between segments may exist, so we need
a criterion for assigning a group of consecutive points to each segment. Com-
paring noise variance estimation for each segment, when the point was added to
the estimation procedure, was the most accurate criterion for resolving overlaps
we have found. Sign of difference between variances at overlap center is com-
puted then the first change in difference sign is searched at both center sides,
if it is found, the overlap is split, if not, one of the segments gets the overlap.
When overlaps are resolved a new search for holes is performed, but in this case
estimation are restricted to holes limits.
Specially when noise is high, clustering procedure could yield more than one
seed for a “true” segment, so we need a procedure for merging similar segments.
We have selected Chow's test [7], from regression theory, designed for comparing
m. at
10
 
Search WWH ::




Custom Search