Optimizing Aggregate Query Processing in Cloud Data Warehouses - Data Management in Cloud, Grid and P2P Systems - page 8

Database Reference

In-Depth Information

Algorithm 2 Predicate/Join Processing Algorithm

Input: Table t א

, predicates of Q, required attributes in result

Output: Result of t

1: if there is a predicate on non-PK/non-FK then

2: if d == 0 for t then

3: Apply predicate on t to get the record ids

4: Store the record-id mapping in the format

5: (rec-id 1 , rec-id 2 ,….)

6: Communicate if necessary with other nodes

7 : else if any table t 1 with d 1 <= d referenced by t then

8: Apply predicate on t

9: Update the mapping with rec-ids of t

10: Perform line 9

11: Eliminate mappings which has no match for t

12: else

13: Perform similar to line 6, 9 and 14

14: end if

15: else if there is a predicate on PK or FK then

16: if d == 0 for t then

17: Scan PK-map and tuple-index-map

18: Perform line 6 to 8

19: else

20: Scan PK-map and tuple-index-map for those rec-ids stored

for table t 1 with d 1 <= d that is referenced by t

21: Perform 12 and 14

22: end if

23: end if

24: Scan tables of T for final mappings (rec-id 1 ,…….) to get the value of

other attributes in the select statement of Q

25: return Result

(b) Join Processing Algorithm [17]

T

(a) Aggregate Query Processing Algorithm

Fig. 4. Query Processing Algorithms

Also, our maps are already sorted on keys which further eliminates most of the

sort operations. At last we retrieve remaining attributes required for the result.

4 Performance Evaluation

In this section, we present the performance study to show the effectiveness of our

proposed PK-map and Tuple-index-map structures while processing aggregate

queries (using Algorithms in Figure 4a and Figure 4b). We will compare the

performance between MySQL and our proposed framework on a large-scale cloud

network called PlanetLab with 150GB of TPC-H star schema data.

PlanetLab [12] [13] is a geographically distributed computing platform avail-

able as a testbed for deploying, evaluating, and accessing planetary-scale net-

work services. It is currently composed of around 1050 nodes (servers) at 400

sites (location) worldwide.

For performance study of this paper we chose 50 PlanetLab machines world-

wide running Red Hat 4.1 Operating System. Each machine has 2.33GHz Intel

Core 2 Duo processor, 4GB RAM and 10GB disk space. We installed regular

MySQL on all of the machines to perform experiments.

We generated 150GB of data using the data generator ”dbgen”, provided by

TPC-H benchmark and distributed it to 50 PlanetLab machines. Each of these

machines store around 3GB data fragments of TPC-H schema relations. We gen-

erated PK-maps and Tuple-index-maps, and then horizontally partitioned them

Next Page

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home