Database Reference
In-Depth Information
exploit either or both of these properties. Specifically, they first compute
the dependency from one variable (e.g., time) to another (e.g., sensor
value), and then consider the regression curves as standards over which
the inferred sensor values reside. The two most popular regression-based
approaches use polynomial and Chebyshev regression for cleaning sensor
values.
Polynomial Regression: Polynomial regression finds the best-fitting
curve that minimizes the total difference between the curve and each
raw sensor value v ij at time t i . Given a degree d , polynomial regression
is formally defined as:
t i ,
v ij = c + α 1
·
t i +
···
+ α d ·
(2.9)
where c is a constant and α 1 ,...,α d are regression coecients.
Polynomial regression with high degrees approximate given time series
with more sophisticated curves, resulting in theoretically more accurate
description of the raw sensor values. Practically, however, low-degree
polynomials, such as constant ( d = 0) and linear ( d = 1), also perform
satisfactorily. In addition, low-degree polynomials can be more eciently
constructed as compared to high-degree polynomials. A (weighted) mov-
ing average model [73] is also regarded as a polynomial regression.
Chebyshev Regression: Chebyshev regression is a popular model
class for fitting sensor values, since they can quickly compute near-
optimal approximations for given time series. Suppose that time values
t i vary within a range [min( t i ) , max( t i )]. We, then, obtain normalized
time values t i within a range [ 1 , 1], by using the following transfor-
mation function f ( t i ) and its inverse transformation function f 1 ( t i )as
follows:
f ( t i )= t i
max( t i )+min( t i )
2
2
·
min( t i ) ,
(2.10)
max( t i )
f 1 ( t i )= t i ·
+ max( t i )+min( t i )
2
max( t i )
min( t i )
.
(2.11)
2
Next, given a degree d , Chebyshev polynomial is defined as:
v ij = f 1 (cos( d
cos 1 ( f ( t i )))) .
·
Figure 2.6 illustrates a data cleaning process using degree-2 Cheby-
shev polynomials. Here, the raw sensor values are plotted as green
curves, while the inferred values, obtained by fitting a Chebyshev poly-
nomials, are overlaid by black curves. The anomaly points are then
indicated by the underlying red histograms as well as red circles.
Search WWH ::




Custom Search