View Selection and Materialization - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

Contribution

Motivating Example

We have designed a view selection method based

on statistic information. This method provides the

following features:

Let's start with a simple example showing how

materialized views can improve query perfor-

mance time in distributed context. Let us consider

three sites S 1 , S 2 , S 3 which are connected by a

slow wide-area network. S 1 , S 2 and S 2 , S 3 are

connected by an edge with the speed 100 KB/

sec, while the connection speed between S 2 and

S 3 is 3 MB/sec. Let us consider three relations

A, B, C stored at S 1 , S 2 , S 3 respectively and two

queries q 1 and q 2 defined as follows. q 1 =A equijoin

C is posed frequently at S 2 , and q 2 = A equijoin

B is posed frequently at S 3 . Suppose now that

the query optimizer decides to execute all the

queries at S 1 and the results size of q 1 , q 2 are 1

MB, 2 MB respectively. So, the cost to answer

q 1 each time at S 2 is 10 sec and to answer q 2 at S 3

is 20 sec, in addition to the cost to compute both

queries at S 1 . This cost has to be paid every time

these queries are posed. While, if the result of q 1 ,

q 2 are materialized at S 2 , S 3 respectively, in this

case only both queries has to be computed once

at S1 and the results are shipped to S 2 , S 3 to be

materialized with the transfer cost of 10, 20 sec

respectively. Then, subsequent queries could be

performed with zero cost. Obviously, material-

ized views have significantly reduced the query

response time. However, two problems can arise:

First, sometimes the result of the queries is too

large to fit in the available space at a specific site,

so we can't materialize the entire workload and

we have to select only the most beneficial queries

to be materialized. Second, when the base relation

is updated, then the view should be maintained

and the cost of the view maintenance can cause

overhead to the system. Assume in our example

that the relation C is frequently updated, in this

case materializing q 2 can be less attractive because

the cost that we save for answering q 2 is now

involved for maintaining the materialized view.

Therefore, the view is considered as beneficial if

and only if its materialization reduces significantly

•

It is dynamic: it treats a query as it arrives

•

It is based on statistic measures for esti-

mating the query processing

It is using a replacement policy to material-

•

ize the most beneficial views

The algorithm for finding the best views to

•

materialize is polynomial

Outline

The rest of this chapter is organized as follows. In

Section 2, we present the problem of view selection

and a framework for representing the queries in

order to detect common sub expressions. Section 3

presents our dynamic approach to view selection.

The materialization and replacement policy and

its related algorithms are described in Section 4.

In Section 5, we present a tool based on our view

materialization method for tuning the performance

of the DBMS. We also show that is scalable and

provides good performance. Finally, Section 6

presents related work and Section 7 contains

concluding remarks and future work.

PRELIMINARIES

A view is a derived relation defined by a query in

terms of source relations and/or other views. It is

said to be materialized when its extent is computed

and persistently stored. Otherwise, it is said to be

virtual. We consider Selection-Projection-Join

(SPJ) views that may involve aggregation and a

group by clause as well.

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home