Parallel Execution of SQL Based Association Rule Mining - Nontraditional Database Systems

Databases Reference

In-Depth Information

Furthermore the query can be easily enhanced and customized according

to the needs since the query is well defined based on simple concepts.

3. Integration with RDBMS

Seamless integration with RDBMS reduces cost of maintenance and

maximize the portability since the difference between platforms can be

absorbed by the RDBMS. In addition, mature technologies used in RDBMS

such as query optimizations, parallelization, indexes, checkpoints so on

are available at no extra cost.

In our evaluation we employ a modified version of SETM. For the implementation

on commercial parallel RDBMS, we could utilyze some other techniques to enhance

the query. Here we introduce using view and subquery to reduce disk I/O.

Recently pure SQL implementation of the well known Apriori algorithm 2) has

been reported but the performance is far behind its object oriented SQL extensions

or other more loosely integrated approachs 11) .

Sarawagi et.al. extended the query to mine generalized association rule with

taxonomy 10) . In addition they also extended the query further to handle sequential

pattern as well. Analysis of execution plan has given some hints to improve

performance of the Apriori based query 12) .

4 Representation of Transaction Data

The transaction data can be representated in relational database using first

normalization such as ilustrated in Table 1. The schema for the table is SALES(TID,

item) where TID represents transaction ID and item represents item code or item

name. For each customer transaction that takes place, tuples corresponding to

every items are inserted into SALES.

4.1 Modified SETM

The first SQL query available to perform flat association rule is called SETM 4) .

In our experiments we employed ordinary standard SQL query that is similar

to SETM algorithm. We modified the query to enable hash join execution. It is

shown in figure 1.

In the first pass we simply gather the count of each item. Items that satisfy the

minimum support inserted into large itemsets table C_1 that takes form(item,

item count). Then transaction data that match large itemsets stored in R_1.

Search WWH ::

Custom Search

Home