Optimizing Aggregate Query Processing in Cloud Data Warehouses - Data Management in Cloud, Grid and P2P Systems

Database Reference

In-Depth Information

select N.name, SUM(O.totalprice)

from ORDERS O, CUSTOMER C, NATION N

where N.name = "UNITED STATES"

or N.name = "INDIA"

and N.nationkey = C.nationkey

and C.custkey = O.custkey

group by N.name;

Fig. 3. Modified Query 1 of Table 3

On the other hand, Figure 2b is the plan generated for the same Query 1 using

our framework. Here, instead of scanning Customer and Orders table, we scan

two PK-maps (NationKey-map and CustKey-map), and its associated Tuple-

index-maps. These maps are very small compared to scanning tables. The join

operation in Figure 2b is not equivalent to join operation in Figure 2a. Instead,

it is performed by scanning map and applying associated filtering constraint.

Algorithms are shown in Figure 4a - 4b and comparison results in Section 4.

Aggregate on Non-Primary or Non-Foreign Key: When there is an aggre-

gate operation on Non-Primary Key (NPK) or Non-Foreign Key (NFK) of the ta-

ble, then we need to scan the table to find the correct result. But, we use inference

to push down the group-by in the query tree so that we can scan only the required

portion of table. We also replace the scanning table with maps wherever possible.

For example, suppose if we have SUM (O.totalprice) instead of COUNT (*)

of Query 1 as shown in Figure 3. We push down the group-by N.name (as in

Figure 2b) to reduce the number of rows to be scanned from the orders table.

This reduces the input rows of final aggregate operation.

Group-by Optimization: When there are multiple attributes in the group-by

clause, we associate certain order to those attributes and push down group-by

in the query tree. As we know that the group-by is a set operation and join

result will carry the sort order, we can change the sequence of attributes in

group-by clause. So, we change the sequence of attributes in group-by clause

corresponding to the sequence of table processing in our algorithm. By doing

this we can eliminate non-relevant data in the early stage of query processing as

well as achieve same result as we perform group-by in the final stage.

Query Processing Using Proposed Method: In Algorithm 1 shown in

Figure 4, we first sort the tables referenced in the query according to their

relationship in the schema. We sort tables starting from the table that does not

have any foreign keys (depth 0) [17]. For example, Query 1's order of processing

will be nation

orders. In addition, we consider all the constraints

in where clause and group - by while deciding on tables processing order.

We then consider aggregate operation to decide whether to scan the table

from the database or to scan our map structure (Line 3-15). If there are PK

or FK constraints, we scan our maps. As all PK to FK mapping information

is available in maps, data movement between nodes is eliminated in lines 5-8.

→

customer

→

Data Management in Cloud, Grid and P2P Systems

Search WWH ::

Custom Search

Home