Database Reference
In-Depth Information
Figure 3. Basic aggregation query steps
'X'GROUP BY to_char(l_shipdate,'yyyy-mm'),
p_brand, year_month;
This typical query contains group-by attributes
that allow the aggregation to be determined for
each group. This aggregation can be handled using
the following scheme: each node needs to apply
an only slightly modified query on its partial data,
and the results are merged by applying the same
query again at the merging node with the partial
results coming from the processing nodes.
Simple additive aggregation primitives are
computed in each node, from which the final ag-
gregation function is derived. The most common
primitives are: (LS, SS, N, MAX, MIN: linear
sum LS = sum(x); sum of squares SS = sum(x2);
number of elements N, extremes MAX and MIN).
Examples of final aggregation functions are:
This means that a query transformation step
needs to replace each AVERAGE and STDDEV
(or variance) expression in the SQL query by a
SUM and a COUNT in the first case and by a SUM,
a COUNT and a SUM_OF_SQUARES in the
second case to determine the local query for each
node. Figure 3 shows an example of aggregation
query processing steps created by DWPA:
Given a basic query processing strategy de-
scribed in this section, we proceed in the next
section with background on replication and load-
balancing.
Replication, Chunks and
Load-Balancing
The use of replication for availability and load-
balancing has been a subject of research for quite
a while now. There are multiple levels at which
to consider the replication issues. Mirrored disk
drives (Tandem, 1987) and RAID disks (Patterson
et al. 1998) are examples of storage organiza-
tion level proposals; Multiple RAID alterna-
tives were proposed, some emphasizing only
reliability advantages, others with performance
and reliability on their sight. At the networked
data level, the concept of distributed RAID (ds-
RAID) (Stonebraker et al. 1990) glues together
distributed storage by software or middleware in
a network-based cluster. The Petal project (Lee
== å _
COUNT
N
N nodei
all
nodes
(2)
== å _
SUM S
LS nodei
all
nodes
(3)
= å
å
AVERAGE
LS
/
N
nodei
nodei
all
_
odes
all
_
odes
(4)
2
åå
(
SS
LS
/
N
)
node
node
STDDEV
=
i
i
N
(5)
Search WWH ::




Custom Search