View Selection and Materialization - Data Warehousing Design and Advanced Engineering Applications

Database Reference

In-Depth Information

the query processing cost and without increasing

significantly the view maintenance.

Another point should be taken into account.

Assume in the previous example that q 1 is fre-

quently used at both S 2 , S 3 in this case material-

izing q 1 only at one site, e.g., S 1 can be benefit

enough for the both sites as the communication

cost between these sites are not high. And in this

way the available storage at S 2 is saved for another

benefit materialized view. Finally, this example

demonstrates that the question of selecting appro-

priate views for materialization is not an obvious

issue and should deal with different parameters

like storage space constraint, view maintenance

cost and communication cost.

different queries. The Multi View Materializa-

tion Graph (MVMG) is similar to the AND-OR

DAG representation of queries in multi query

optimization. The MVMG is a bipartite Directed

Acyclic Graph (DAG) composed of two types

of nodes: AND-nodes and OR nodes. Each

AND-node represents an algebraic expression

(Select-Project-Join) with possible aggregate

function. An OR node represents a set of logical

expression that are equivalent (i.e., that yield

the same result). The AND-nodes have only

OR-nodes as children and OR-nodes have only

AND-nodes as children. The MVMG represents

AND-OR DAGs of several queries in a single

DAG. The leaf nodes of the MVMG are equiva-

lence nodes representing the base relations. In

general, for each base relation, there is one leaf

node except in case of a self join. Equivalence

nodes in MVMG correspond to the views that

are candidate to the materialization.

The queries used in this chapter are defined

over the TPC-H schema described in Figure 1 .

The Part table contains information about each

product. The Supplier table describes the sup-

plier of the products. Because a supplier is able

to supply more than one product and a product

may be supplied by several suppliers: there is a

table PartSupp containing information about who

supplies what products, in which quantity and so

on. The Customer table describes the information

concerning the clients like the name, address

and the nation. The sales are represented by two

relations: Orders and Lineitem . The table Orders

contains information about the customer, the price

of ordered items and so on. Finally, the Lineitem

table contains for each ordered item, the ordered

quantity and the corresponding supplier.

Let us consider the query Q 1, which finds

the number of orders of Airbus planes ordered

by different nations.Q 1 : Select N.name, P.brand,

O.orderdate, Sum(L.quantity)From Part P, Cus-

tomer C, Orders O, Nation N, Lineitem LWhere

P.type = 'airplane'and AND P.brand = 'Airbus'

AND P.partkey = L.partkey AND L.orderkey =

VIEW SELECTION PROBLEM

The general problem of view selection is to select

a set of views to materialize that optimizes both

the view maintenance and query processing time

given some constraints like storage space. To find

the optimal solution satisfying all constraints is a

NP-complete problem, therefore it is necessary

to develop heuristics. In this chapter, we present

a dynamic view selection method which includes

a replacement policy of already materialized

views.

More formally, given a set of source relations

R={R 1 ,R 2 ,…,R n } a set of conjunctive queries of

workload Q = {Q 1 ,Q 2 ,…,Q k }, the problem is to

find a set of views to materialize M={V 1 ,V 2 , …,V}

under a storage space constraint. However, in this

paper, the view selection algorithm is applied as

a query arrives. Therefore, the workload is built

incrementally.

Framework for Detecting

Common Views

In this subsection, we present a framework

for representing views to materialize in order

to exhibit common sub-expressions between

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home