Database Reference
In-Depth Information
2 Related Work
Aggregate query processing has been studied in many research works [5]. But, as
per our knowledge, not many of them consider communication cost in optimizing
aggregate query processing. We analyzed some of the works which optimize the
aggregate query operations. Along with that knowledge, we propose our storage
structures, which will not only optimize query operations, but also communica-
tion cost overhead caused in cloud data warehouses.
Some of the earlier papers, which optimize aggregate query processing, are
[2] [14] and [22]. These papers provide optimizations by pushing down group-
by in the query tree to improve the query response time. W.Yan [22] proposed
two kinds of transformations namely, eager aggregation and lazy aggregation. In
eager aggregation, group-by operation is pushed down in the query tree, while
in lazy aggregation group-by is pushed up. We use the above transformations
of [22] in our system along with our PK-map and Tuple-index-map to generate
optimized query plan to process aggregate queries.
Order-Optimization [4], presents techniques to reduce the number of sorts
needed for query processing by finding the cover set using keys, predicates and
indexes. Since our proposed map structures are already sorted on keys, we elim-
inate most of the sort operations required for join operation on the tables.
Coloring-Away [23], proposed query plan generation using tree-coloring mech-
anism. This paper considers both communication cost and data re-partitioning,
and uses tree coloring to generate optimal query plan. In our framework, we op-
timize the query operations that cause the above mentioned query performance
problems such as aggregates and joins by doing sort and group-by on the fly.
Avoid-Sort-Groupby [24], proposed a query plan refining algorithm through
which unnecessary sorting and grouping can be eliminated from the query plan.
It uses inference strategies and order properties of the relation table to find the
unnecessary sorting or grouping. T.Neumann[19], points out that it is necessary
to consider both ordering and grouping to generate the query plan.
Cooperative-Sort [25], presented an evaluation technique for sorting tables.
This technique is for those queries that need multiple sort orders of the same
table on different attributes. This minimizes the I/O operations of successive
sort operations, which reduce the overall query cost.
Pre-computing the aggregates is proposed by many other researchers [6] [16],
which are useful for decision support systems. Decision support systems store
huge amount of historical data for analysis and decision-making. These databases
are updated less frequently (once a hour/day) on batches. This made it easy to
compute the aggregation ahead of time and store it as data cubes or materialized
views. Recently, the interval between historic and current data has been reduced
a lot. This will make it complicated and time consuming to re-compute data
cubes or materialized views every time data gets updated. Recent research by
companies like HP, Oracle and Teradata [8] [18] [21] shows new parallelization
schemes for processing joins and aggregate operations, eliminating data cubes.
So, in this paper we concentrate on optimizing aggregate queries without pre-
computation.
Search WWH ::




Custom Search