Database Reference
In-Depth Information
which maximize p 2 and minimize d . Assuming unit-variance ( d =1),
such x i is the one with the biggest correlation coecient to y .Th s
concludes the proof.
The question is how we should handle the case when b> 1. Normally,
we should consider all the possible groups of b independent variables,
and try to pick the best. This approach explodes combinatorially; thus
we propose to use a greedy algorithm. At each step s , we select the inde-
pendent variable x s that minimizes the EEE for the dependent variable
y , in light of the s
1 independent variables that we have already chosen
in the previous steps.
The bottleneck of the algorithm is clearly the computation of EEE.
Since it computes EEE approximately O ( v · b ) times and each computa-
tion of EEE requires O ( N ·b 2 ) in average, the overall complexity mounts
to O ( N · v · b 3 ). To reduce the overhead, we observe that intermediate
results produced for EEE( S ) can be re-used for EEE( S∪{x} ).
Lemma 5.3 The complexity of the greedy selection algorithm is O ( N
·
b 2 ) .
v
·
+ be
+ )istheinverse
S
S∪{
x
}
S
Proof. Let
. The core in computing EEE(
of D S + =( X T
S + X S + ). Thanks to block matrix inversion formula [34] (p.
656) and the availability of D 1
from the previous iteration step, it can
S
2 ). Hence, summing it up over v
be computed in O ( N
·|S|
+
|
S
|
−|S|
b 2 + v
b 3 )
remaining variables for each b iteration, we have O ( N
·
v
·
·
b 2 ).
complexity. Since N
b , it reduces to O ( N
·
v
·
Subset-selection can be done infrequently and off-line, say every N = W
time-ticks. The optimal choice of the reorganization window is beyond
the scope of this chapter. Potential solutions include (a) doing reorga-
nization during off-peak hours, (b) triggering a reorganization whenever
the estimation error for by increases above an application-dependent
threshold etc. Also, by normalizing the training set, the unit-variance
assumption in Theorem 1 can be easily satisfied.
6. Tracking Correlations and Hidden Variables:
SPIRIT
In this section we present a framework for discovering patterns in
multiple streams. In the next section, we show how these can be used
to perform effective, low-cost forecasting. We use auto-regression for its
simplicity, but our framework allows any forecasting algorithm to take
advantage of the compact representation of the stream collection.
Search WWH ::




Custom Search