Database Reference
In-Depth Information
For AR per stream (ignoring correlations), it is O ( n 2 ). However, for
SPIRIT, we need O ( kn ) space for the w i and, with one AR model per y i ,
the total space complexity is O ( kn + k 2 ). As published, MUSCLES re-
quires space that grows cubically with respect to the number of streams
n . We believe it can be made to work with quadratic space, but this is
still prohibitive. Both AR per stream and SPIRIT require space that
grows linearly with respect to n , but in SPIRIT k is typically very small
( k
n ) and, in practice, SPIRIT requires less memory and time per
update than AR per stream. More importantly, a single, independent
AR model per stream cannot capture any correlations, whereas SPIRIT
indirectly exploits the correlations present within a time tick.
Missing values When we have a forecasting model, we can use the
forecast based on x t− 1 to estimate missing values in x t .Wethenuse
these estimated missing values to update the weight estimates, as well
as the forecasting models. Forecast-based estimation of missing values
is the most time-ecient choice and gives very good results.
7.2 Interpretation
At any given time t , SPIRIT readily provides two key pieces of infor-
mation (aside from the forecasts, etc.): (i)The number of hidden vari-
ables k . (ii) The weights w i,j ,1
i
k ,1
j
n .Intu ivel ,
the magnitude
of each weight tells us how much the i -th hidden
variable contributes to the reconstruction of the j -th stream.
In the chlorine example during phase 1 (see Figure 5.1 ), the dataset
has only one hidden variable, because one sinusoidal-like pattern can
reconstruct both streams (albeit with different weights for each). Thus,
SPIRIT correctly identifies correlated streams. When the correlation
was broken, SPIRIT introduces enough hidden variables to capture that.
Finally, it also spots that, in phase 3, normal operation is reestablished
and thus disposes of the unnecessary hidden variable.
|
w i,j |
8. Pattern Discovery across Time
Many pattern discovery methods first project the data onto “all” bases
in a given family (e.g., Fourier, wavelets, etc) and then choose a few co-
ecients that capture the most information. In contrast, among all
possible bases, we first choose a few bases that are guaranteed to cap-
ture the most information and consequently project the data only onto
those. However, eciently determining these few bases and incremen-
tally updating them as new points arrive is a challenging problem. To
Search WWH ::




Custom Search