Parallel Execution of SQL Based Association Rule Mining - Nontraditional Database Systems

Databases Reference

In-Depth Information

Figure 6: Execution trace(200k 0.5% 5 nodes)

In candidate generation, occured during interval from 3.5s to 12s and also

shown in figure as phase #1, heavy probe operation during join in the first half of

this phase resulting in 100% CPU load and low disk reading throughput. In later

part of this phase disk I/O bound occured when the result of the join stored in

disk. CPU bound is also observed in other phase involving hash table probing

such as candidate matching (phase #6, 21.5s-23.5s) and during global support

gathering (phase #2, 12s-16.5s) which employs hash table updating for aggregation.

However significant network throughput dominates the global support gathering

(phase #3 and #4, 16.5s -21.5s) when processing nodes exchanging their local

support counts.

6 Performance Evaluation using Commercial Parallel RDBMS

Since the size of databases and the amount of required processing power has

increased incredibly, parallel processing ability has become a must for commercial

RDBMS. It is interesting to know whether currently available technology can

achieve sufficient performance when handling complex query such as association

rule mining.

6.1 Parallel Execution Environment

In our experiment we employed commercial Parallel RDBMS: IBM DB2 UDB

EEE version 6.1 on IBM UNIX Parallel Server System: IBM RS/6000 SP. 12 nodes

Search WWH ::

Custom Search

Home