View Management Techniques and Their Application to Data Stream Management - Evolving Application Domains of Data Warehousing and Mining

Database Reference

In-Depth Information

Data warehouses rely heavily on analysis of

up-to-date information to support decision makers.

The advent of a new class of data management

applications, namely data stream management

systems (DSMS), provides new opportunities for

analysis of timely information. A data stream is

a continuous, rapid, time-varying, and transient

stream of data. There are connections between

DSMS and view management. Whereas continu-

ous query processing is related to view mainte-

nance in data warehousing, multi-query optimi-

zation for continuous queries is highly related to

view selection in conventional relational DBMS

and data warehouses. In this chapter, we give an

overview of view maintenance and view selec-

tion methods, explain the fundamental issues of

data stream management, and discuss how view

management techniques from data warehousing

are related to data stream management.

The chapter is structured as follows: sec-

tion 2 briefly explains the roles of views in data

warehouses. Section 3 gives an overview of view

maintenance methods and classifies them accord-

ing to various criteria. Then, section 4 explains the

view selection problem and presents a taxonomy

of existing view selection techniques. Section 5

discusses issues and challenges in data stream

management and summarizes recent results in

research on data streams. Section 6 discusses

the relationship of view management techniques

to data stream management. Similarities, differ-

ences and possible connections between data

stream management and view management are

discussed. Finally, section 7 summarizes the

chapter and points out directions for future re-

search in view management, data streams, and

data warehousing.

common data operations, data warehouses aim at

supporting data analysis (i.e., On-Line Analytical

Processing, OLAP) and are known for their vast

volume of data and complexity of queries. The

response time of queries, if evaluated from base

tables, is usually too long for users to tolerate

as a huge amount of data has to be processed.

Therefore, it is a common practice to pre-compute

summaries of base tables in order to reduce the

query response time. The following example il-

lustrates the benefit of materializing views:

Example 1 Consider the TPC-D benchmark

(Serlin, 1993), modeling a data cube of sales with

three dimensions: part, supplier, and customer. We

denote the base table as R(part; supp; cust; sales).

The following query is posed by users:

Q : SELECT part, SUM(sales) AS total

FROM R

GROUP BY part;

The following two materialized views can

both benefit Q:

V 1 : SELECT part, cust, SUM(sales) AS total

FROM R

GROUP BY part, cust;

V 2 : SELECT part, supp, SUM(sales) AS

total

FROM R

GROUP BY part, supp;

It depends on the statistics of the data to decide

which view is better in terms of query response

or storage cost. For instance, the statistics of the

TPC-D database are as follows:

•

R : 6M rows

•

V 1 : 6M rows

•

V 2 : 0.8M rows

It is easy to see that materializing V 2 will

benefit answering Q , because V 2 is much smaller

to scan than the base table. Meanwhile, V 1 is not

quite useful since it has a comparable size to the

base table.

Nonetheless, materialization of views comes at

some price. On the one hand, materializing views

views in data Warehousing

A view can select or restructure data in such a way

that an application can use the data more efficiently.

Different from On-Line Transaction Processing

(OLTP) systems, which focus at managing the

Evolving Application Domains of Data Warehousing and Mining

Search WWH ::

Custom Search

Home