DIMENSIONALITY REDUCTION AND FILTERING ON TIME SERIES SENSOR STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

by using only a small number of carefully selected other sequences. We

can thus do some preprocessing of a training set, to find a promising

subset of sequences, and to apply MUSCLES only to those (hence the

name Selective MUSCLES).

Assume that sequence i is the one notoriously delayed and we need

to estimate its “delayed” values x t,i . For a given tracking window span

W ,amongthe v = W ∗ n + n − 1 independent variables, we have to

choose the ones that are most useful in estimating the delayed value of

x t,i . More generally, we want to solve the following

Problem 5.1 (Subset selection) Given v independent variables

x 1 ,x 2 ,...,x v and a dependent variable y with N samples each, find the

best b ( <v ) independent variables to minimize the mean-square error

for y for the given samples.

We need a measure of goodness to decide which subset of b variables

is the best we can choose. Ideally, we should choose the best subset

that yields the smallest estimation error in the future. Since, however,

we don't have future samples, we can only infer the expected estimation

error (EEE for short) from the available samples as follows:

( y [ t ] − y S [ t ]) 2

EEE( S )=

t =1

where S is the selected subset of variables and y S [ t ] is the estimation

based on S for the t -th sample. Note that, thanks to Eq. 5.3, EEE( S )

can be computed in O ( N

2 ) time. Let's say that we are allowed

to keep only b = 1 independent variable. Which one should we choose?

Intuitively, we could try the one that has the highest (in absolute value)

correlation coecient with y . It turns out that this is indeed optimal:

(to satisfy the unit variance assumption, we will normalize samples by

the sample variance within the window.)

·S

Lemma 5.2 Given a dependent variable y ,and v independent variables

with unit variance, the best single variable to keep to minimize EEE(

)

is the one with the highest absolute correlation coecient with y .

Proof. For a single variable, if a is the least squares solution, we can

express the error in matrix form as

− 2 a ( y T x i )+ a 2

2 .

EEE( {x i } )= y

x i

2 and ( x T y ), respectively. Since a = d − 1 p ,

Let d and p denote

x i

p 2 d − 1 . To minimize the error, we must choose x i

EEE(

{

x i }

−

Managing and Mining Sensor Data

Search WWH ::

Custom Search

Home