ChunkSim - Data Warehousing Design and Advanced Engineering Applications - page 137

Database Reference

In-Depth Information

Figure 3. Basic aggregation query steps

'X'GROUP BY to_char(l_shipdate,'yyyy-mm'),

p_brand, year_month;

This typical query contains group-by attributes

that allow the aggregation to be determined for

each group. This aggregation can be handled using

the following scheme: each node needs to apply

an only slightly modified query on its partial data,

and the results are merged by applying the same

query again at the merging node with the partial

results coming from the processing nodes.

Simple additive aggregation primitives are

computed in each node, from which the final ag-

gregation function is derived. The most common

primitives are: (LS, SS, N, MAX, MIN: linear

sum LS = sum(x); sum of squares SS = sum(x2);

number of elements N, extremes MAX and MIN).

Examples of final aggregation functions are:

This means that a query transformation step

needs to replace each AVERAGE and STDDEV

(or variance) expression in the SQL query by a

SUM and a COUNT in the first case and by a SUM,

a COUNT and a SUM_OF_SQUARES in the

second case to determine the local query for each

node. Figure 3 shows an example of aggregation

query processing steps created by DWPA:

Given a basic query processing strategy de-

scribed in this section, we proceed in the next

section with background on replication and load-

balancing.

Replication, Chunks and

Load-Balancing

The use of replication for availability and load-

balancing has been a subject of research for quite

a while now. There are multiple levels at which

to consider the replication issues. Mirrored disk

drives (Tandem, 1987) and RAID disks (Patterson

et al. 1998) are examples of storage organiza-

tion level proposals; Multiple RAID alterna-

tives were proposed, some emphasizing only

reliability advantages, others with performance

and reliability on their sight. At the networked

data level, the concept of distributed RAID (ds-

RAID) (Stonebraker et al. 1990) glues together

distributed storage by software or middleware in

a network-based cluster. The Petal project (Lee

== å _

COUNT

N

N nodei

all

nodes

(2)

== å _

SUM S

LS nodei

all

nodes

(3)

= å

å

AVERAGE

LS

/

N

nodei

nodei

all

_

odes

all

_

odes

(4)

2

åå

(

SS

LS

/

N

)

node

node

STDDEV

=

i

i

N

(5)

Next Page

Data Warehousing Design and Advanced Engineering Applications

Search WWH ::

Custom Search

Home