Database Reference
In-Depth Information
4.1.3
Stream Queries
There are two ways that queries get asked about streams. We show in Fig. 4.1 a place within
the processor where standing queries are stored. These queries are, in a sense, permanently
executing, and produce outputs at appropriate times.
EXAMPLE 4.1 The stream produced by the ocean-surface-temperature sensor mentioned at
the beginning of Section 4.1.2 might have a standing query to output an alert whenever the
temperature exceeds 25 degrees centigrade. This query is easily answered, since it depends
only on the most recent stream element.
Alternatively, we might have a standing query that, each time a new reading arrives, pro-
duces the average of the 24 most recent readings. That query also can be answered easily,
if we store the 24 most recent stream elements. When a new stream element arrives, we
can drop from the working store the 25th most recent element, since it will never again be
needed (unless there is some other standing query that requires it).
Another query we might ask is the maximum temperature ever recorded by that sensor.
We can answer this query by retaining a simple summary: the maximum of all stream ele-
ments ever seen. It is not necessary to record the entire stream. When a new stream ele-
ment arrives, we compare it with the stored maximum, and set the maximum to whichever
is larger. We can then answer the query by producing the current value of the maximum.
Similarly, if we want the average temperature over all time, we have only to record two
values: the number of readings ever sent in the stream and the sum of those readings. We
can adjust these values easily each time a new reading arrives, and we can produce their
quotient as the answer to the query.
The other form of query is ad-hoc , a question asked once about the current state of a
stream or streams. If we do not store all streams in their entirety, as normally we can not,
then we cannot expect to answer arbitrary queries about streams. If we have some idea what
kind of queries will be asked through the ad-hoc query interface, then we can prepare for
them by storing appropriate parts or summaries of streams as in Example 4.1 .
If we want the facility to ask a wide variety of ad-hoc queries, a common approach is to
store a sliding window of each stream in the working store. A sliding window can be the
most recent n elements of a stream, for some n , or it can be all the elements that arrived
within the last t time units, e.g., one day. If we regard each stream element as a tuple, we
can treat the window as a relation and query it with any SQL query. Of course the stream-
management system must keep the window fresh, deleting the oldest elements as new ones
come in.
Search WWH ::




Custom Search