Beyond Aggregation - Real-Time Analytics

Database Reference

In-Depth Information

}

public ThresholdDetector(Forecaster f, int n) {

this (f,n,3);

}

public ThresholdDetector(Forecaster f) {

this (f,30);

}

When a new observation arrives, it is run through the forecaster to

determine the error. If the error exceeds the number of standard deviations

specified by sigma, it is considered an outlier. In this case, it is not used

to update the standard deviation of the errors to avoid skewing the data.

Otherwise, the standard deviation calculation is updated to reflect a

non-outlier value:

public boolean observe(double y) {

double err = y - f.forecast(y);

double sig = Math. sqrt (s2/((double)n-1.0));

//If this is an outlier don't include it in s2

if(Math. abs (err)/sig > sigma)

return true;

//Otherwise update our standard deviation

s2 += err*err;

if(values.size() == n)

s2 -= values.removeFirst();

values.add(err*err);

return false;

}

There are other approaches, but they mostly employ this basic framework

for their updates. For example, rather than using the standard deviation,

many outlier detectors declare an outlier as being outside 1.5 or 3 times the

interquartile range. This was originally used to identify outliers in boxplot

visualizations and has since been repurposed for outlier detection. This is

further generalized by scan statistic approaches, which use the percentiles

of the error to determine whether the process is in an outlier state.

Search WWH ::

Custom Search

Home