Database Reference
In-Depth Information
ratio (CSR), which measures the percentage of
total query costs saved due to hits in the cache.
Because of the chunking organization in (Desh-
pande et al., 1998), a simple metric of coverage
of base tables is adopted for measuring benefits
of cached chunks.
Recency metrics of caching are well studied
in the literature. One well known strategy is Least
Recently Used (LRU), which discards the oldest
data cached. The strategy is extended to LRU-K
by O'Neil et al. (1993) to take advantage of recent
access patterns. Deshpande et al. (1998) utilize
the CLOCK algorithm (Silberschatz et al., 2002),
an efficient approximation of LRU.
There are two ways to use the benefit metric
and the recency metric simultaneously. One way
is to consider the recency and benefit in parallel:
first use LRU to select a candidate set and then use
benefit to decide on replacement (Scheuermann
et al., 1996). The other way is to use an aging
strategy to obtain benefit in the recent time win-
dow and then use the windowed benefit for both
candidate set selection and replacement decision
(Deshpande et al., 1998).
analysis. The characteristics of this family of ap-
plications have quite some implications on storage
and query processing that make it impossible to
use conventional DBMSs for such tasks.
These new requirements gave rise to a new
class of data management systems, so called data
stream management systems (DSMS) (Babcock
et al., 2002). Although there are similarities
between data stream management systems and
conventional database management systems, the
requirements of data stream analysis necessitate
new types of queries and new query evaluation
techniques.
Issues in DSMS include that special care has
to be taken in incremental computation of state-
ful operators like joins and aggregations, because
they could block a query. Queries are usually only
evaluated over a window of most recent data,
since otherwise the amount of data would grow
unbounded. All operators must process incoming
tuples incrementally. Furthermore, continuous
query evaluation must take into account common
subexpressions of queries registered with the
streams to execute the same operators only once
and stream the results to subsequent operators of
concurrent queries.
This section surveys different aspects of
DSMSs and queries against data streams. After
giving a formal definition of the notion of a data
stream, we describe different possibilities to de-
fine the portions of streams to be used for query
answering. These methods, called window models,
are a particularly characteristic feature of DSMSs
and a means for computing approximate query
answers. Due to the amount of incoming data,
storage of streaming data is often only possible in
an aggregated form. Therefore, we conclude the
discussion with a survey on different techniques
to produce such synopses or digests of stream-
ing data.
For the remainder of this section we adopt
some definitions by Arasu et al. (2006) since most
other models can be reduced to their model. We
dAtA StreAM MAnAgeMent
Whereas in traditional database management
systems different queries are posed against static
data, in many applications a relatively fixed set of
processing tasks must be evaluated against an ever
changing sequence of data tuples. Such monitor-
ing applications (Abadi et al., 2003) evaluate their
queries against streams of data. In contrast to a
database that entirely resides in a set of (virtual)
files, a data stream is a rapidly flowing stream of
structured data that is so vast in its amount that it
is usually impossible to store the complete data
on persistent memory.
Common examples of monitoring applications
are analysis of financial tickers, web click stream
analysis, traffic monitoring, or network traffic
Search WWH ::




Custom Search