Databases Reference
In-Depth Information
Inside execution plan for this query, join to generate candidate itemsets is
executed at each node independently. And then after counting itemsets locally,
nodes exchange the local counts by applying hash function on the itemsets in
order to determine overall support count of each itemset. Another data exchange
occurs when large itemsets are distributed to every nodes. Finally each node
independently execute join again to create the transaction data.
Execution plan for SQL statement of pass k which is shown in Figure 1 is here:
Pass k:
1.
Build hash table from R_k-1.Probe the hash table using table R_k-1 itself,
insert tuples that satisfy select conditions into table RTMP_k.
2.
Count support in the local node.
3.
In order to count global support, distribute the count result to processing
nodes by applying hash function on items.
4.
Count global support for each item.
5.
Broadcast items that satisfy the minimum support to every nodes and insert
to table C_k.
6.
Build hash table from C_k.Probe hash table using transaction data, insert
tuples that satisfy select conditions into table R_k.
The detail of the execution plan is shown in Figure 4. Figure 5 give graphical
description of the execution plan.
We have also examined some other alternatives such as instead of distributing
the result of support counting in each node to obtain global item support we
distribute the tuples each pass by applying hash function on items before support
counting, therefore all support counting can be performed in one step. But
preliminary test for this plan has shown poor performance. In order to suppress
the communication cost this plan requires most of items remained at sender node.
Our preliminary test showed that more tuples are redistributed so this plan is
impractical unless the query processor is equipped with mechanism to detect the
distribution of items in each node.
This kind of query utilyze the so called self join extensively. The self join is a
join between the same table. Proper optimization of self join will definitely make
the execution faster.
DBKernel provides facility to trace the usage of resource during execution as
shown in figure 6. Second pass occupies interval from 3s until 23.5s.
Search WWH ::




Custom Search