Parallel Execution of SQL Based Association Rule Mining - Nontraditional Database Systems

Databases Reference

In-Depth Information

Figure 2: Execution time(left) Speedup ratio(right)

itemsets thus it is the most time consuming phase. Figure 3(right) shows the speedup

ratio for each pass. The later passes, the smaller candidate itemsets. Thus non-

negligible parallelization overhead become dominant especially in passes later than

five. Depending on the size of candidate itemsets, we could change the degree of

parallelization. That is, we should reduce the number of nodes on later passes.

Such extensions will need further investigations.

Figure 3: Pass analysis (minimum support 5%). Contribution of each pass in execution

time(left) Speedup ratio of each pass(right)

5.4 Execution Behaviour

Original SETM algorithm assumes execution using sort-merge join 4) . Although

they have showed that sort-merge join is better than nested loop joi n with indexes,

sort process is hardly parallelable. Inside database server on our system, relational

joins are executed using hash joins and tables are partitioned over nodes by hashing.

As the result, parallelization efficiency is much improved. This approach is very

effective for large scale data mining.

The DB Kernel allows user to freely custom the execution plan of any query.

We have made the execution plan to accommodate the hash join while suppress

the communication among nodes to achieve better speedup ratio.

Search WWH ::

Custom Search

Home