DIMENSIONALITY REDUCTION AND FILTERING ON TIME SERIES SENSOR STREAMS - Managing and Mining Sensor Data

Database Reference

In-Depth Information

which maximize p 2 and minimize d . Assuming unit-variance ( d =1),

such x i is the one with the biggest correlation coecient to y .Th s

concludes the proof.

The question is how we should handle the case when b> 1. Normally,

we should consider all the possible groups of b independent variables,

and try to pick the best. This approach explodes combinatorially; thus

we propose to use a greedy algorithm. At each step s , we select the inde-

pendent variable x s that minimizes the EEE for the dependent variable

y , in light of the s

1 independent variables that we have already chosen

in the previous steps.

The bottleneck of the algorithm is clearly the computation of EEE.

Since it computes EEE approximately O ( v · b ) times and each computa-

tion of EEE requires O ( N ·b 2 ) in average, the overall complexity mounts

to O ( N · v · b 3 ). To reduce the overhead, we observe that intermediate

results produced for EEE( S ) can be re-used for EEE( S∪{x} ).

−

Lemma 5.3 The complexity of the greedy selection algorithm is O ( N

b 2 ) .

+ be

+ )istheinverse

S∪{

}

Proof. Let

. The core in computing EEE(

of D S + =( X T

S + X S + ). Thanks to block matrix inversion formula [34] (p.

656) and the availability of D − 1

from the previous iteration step, it can

2 ). Hence, summing it up over v

be computed in O ( N

·|S|

−|S|

b 2 + v

b 3 )

remaining variables for each b iteration, we have O ( N

b 2 ).

complexity. Since N

b , it reduces to O ( N

Subset-selection can be done infrequently and off-line, say every N = W

time-ticks. The optimal choice of the reorganization window is beyond

the scope of this chapter. Potential solutions include (a) doing reorga-

nization during off-peak hours, (b) triggering a reorganization whenever

the estimation error for by increases above an application-dependent

threshold etc. Also, by normalizing the training set, the unit-variance

assumption in Theorem 1 can be easily satisfied.

6. Tracking Correlations and Hidden Variables:

SPIRIT

In this section we present a framework for discovering patterns in

multiple streams. In the next section, we show how these can be used

to perform effective, low-cost forecasting. We use auto-regression for its

simplicity, but our framework allows any forecasting algorithm to take

advantage of the compact representation of the stream collection.

Managing and Mining Sensor Data

Search WWH ::

Custom Search

Home