Database Reference
In-Depth Information
Optimizing Aggregate Query Processing
in Cloud Data Warehouses
Swathi Kurunji, Tingjian Ge, Xinwen Fu,
Benyuan Liu, Amrith Kumar, and Cindy X. Chen
University of Massachusetts Lowell, MA, USA
{ skurunji,ge,xinwenfu,bliu,cchen } @cs.uml.edu,
amrith@parelastic.com
Abstract. In this paper, we study and optimize the aggregate query
processing in a highly distributed Cloud Data Warehouse, where each
database stores a subset of relational data in a star-schema. Existing
aggregate query processing algorithms focus on optimizing various query
operations but give less importance to communication cost overhead
(Two-phase algorithm). However, in cloud architectures, the communi-
cation cost overhead is an important factor in query processing. Thus, we
consider communication overhead to improve the distributed query pro-
cessing in such cloud data warehouses. We then design query-processing
algorithms by analyzing aggregate operation and eliminating most of
the sort and group-by operations with the help of integrity constraints
and our proposed storage structures, PK-map and Tuple-index-map.
Extensive experiments on PlanetLab cloud machines validate the ef-
fectiveness of our proposed framework in improving the response time,
reducing node-to-node interdependency, minimizing communication
overhead, and reducing database table access required for aggregate
query.
Keywords: Aggregate Operation, Communication Cost, Read-
Optimized Database, Data Warehouse, Cloud Storage, Query Optimiza-
tion.
1 Introduction
Data Warehouses or decision support systems use join, group-by, and aggre-
gate operations very often in formulating analytical queries. One of the survey
conducted by Oracle [15] shows that, 36% of Data Warehouse users are having
performance problems. Common performance bottlenecks include loading large
data volumes into a data warehouse, poor metadata scalability, running reports
that involve complex table joins and aggregation, increase in the complexity of
data (dimensions), and presenting time-sensitive data to business managers etc.
Ecient evaluation of complex queries (i.e. aggregate and multi-join queries)
is an important issue in applications that manage and analyze multidimensional
data (analytical business data, scientific data, spatial data etc.). Ecient execu-
tion of such queries in large-scale and dynamic cloud databases is a challenging
 
Search WWH ::




Custom Search