Database Reference
In-Depth Information
In this simple example, we first need to discover the correct number of
hidden variables, which may change over time. Under normal operation,
only one hidden variable is needed, which corresponds to the periodic
pattern ( Figure 5.1b , top). Both observed variables follow this hid-
den variable (multiplied by a constant factor, which is the participation
weight of each observed variable into the particular hidden variable).
Mathematically, the hidden variables are the principal components of
the observed variables and the participation weights are the entries of
the principal direction vectors (more precisely, this is true under certain
assumptions, which will be explained later).
However, during the leak, a second trend is detected and a new hidden
variable is introduced ( Figure 5.1b , bottom). As soon as the leak is fixed,
the number of hidden variables returns to one. If we examine the hidden
variables, the interpretation is straightforward: The first one still reflects
the periodic demand pattern in the sections of the network under normal
operation. All nodes in this section of the network have a participation
weight of
0tothe
new one. The second hidden variable represents the additive effect of
the catastrophic event, which is to cancel out the normal pattern. The
nodes close to the leak have participation weights
1 to the “periodic trend” hidden variable and
0 . 5tobothhidden
variables.
Summarizing this example, we find that ( Figure 5.1 ): (i) Under nor-
mal operation (phases 1 and 3), there is one trend. The corresponding
hidden variable follows a periodic pattern and all nodes participate in
this trend. All is well. (ii) During the leak (phase 2), there is a sec-
ond trend, trying to cancel the normal trend. The nodes with non-zero
participation to the corresponding hidden variable can be immediately
identified (e.g., they are close to a construction site). An abnormal
event may have occurred in the vicinity of those nodes, which should be
investigated.
Matters are further complicated when there are hundreds or thousands
of nodes and more than one demand pattern. However, as we show later,
it is still possible to extract the key trends from the stream collection,
follow trend drifts and immediately detect outliers and abnormal events.
Besides providing a concise summary of key trends/correlations among
streams, correlations can be used to successfully deal with missing values
and the discovered hidden variables can be used to do very ecient,
resource-economic forecasting.
There are several other applications and domains in which correlation
analysis and anomaly detection can be fruitfully combined. For exam-
ple, (i) given more than 50,000 securities trading in US, on a second-
by-second basis, detect patterns and correlations [62], (ii) given trac
Search WWH ::




Custom Search