DIMENSIONALITY REDUCTION AND FILTERING ON TIME SERIES SENSOR STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

For AR per stream (ignoring correlations), it is O ( n 2 ). However, for

SPIRIT, we need O ( kn ) space for the w i and, with one AR model per y i ,

the total space complexity is O ( kn + k 2 ). As published, MUSCLES re-

quires space that grows cubically with respect to the number of streams

n . We believe it can be made to work with quadratic space, but this is

still prohibitive. Both AR per stream and SPIRIT require space that

grows linearly with respect to n , but in SPIRIT k is typically very small

( k

n ) and, in practice, SPIRIT requires less memory and time per

update than AR per stream. More importantly, a single, independent

AR model per stream cannot capture any correlations, whereas SPIRIT

indirectly exploits the correlations present within a time tick.

Missing values When we have a forecasting model, we can use the

forecast based on x t− 1 to estimate missing values in x t .Wethenuse

these estimated missing values to update the weight estimates, as well

as the forecasting models. Forecast-based estimation of missing values

is the most time-ecient choice and gives very good results.

7.2 Interpretation

At any given time t , SPIRIT readily provides two key pieces of infor-

mation (aside from the forecasts, etc.): (i)The number of hidden vari-

ables k . (ii) The weights w i,j ,1

≤

i

≤

k ,1

≤

j

≤

n .Intu ivel ,

the magnitude

of each weight tells us how much the i -th hidden

variable contributes to the reconstruction of the j -th stream.

In the chlorine example during phase 1 (see Figure 5.1 ), the dataset

has only one hidden variable, because one sinusoidal-like pattern can

reconstruct both streams (albeit with different weights for each). Thus,

SPIRIT correctly identifies correlated streams. When the correlation

was broken, SPIRIT introduces enough hidden variables to capture that.

Finally, it also spots that, in phase 3, normal operation is reestablished

and thus disposes of the unnecessary hidden variable.

|

w i,j |

8. Pattern Discovery across Time

Many pattern discovery methods first project the data onto “all” bases

in a given family (e.g., Fourier, wavelets, etc) and then choose a few co-

ecients that capture the most information. In contrast, among all

possible bases, we first choose a few bases that are guaranteed to cap-

ture the most information and consequently project the data only onto

those. However, eciently determining these few bases and incremen-

tally updating them as new points arrive is a challenging problem. To

Managing and Mining Sensor Data

Search WWH ::

Custom Search

Home