Parallel Execution of SQL Based Association Rule Mining - Nontraditional Database Systems

Databases Reference

In-Depth Information

association rule mining. The SQL queries to mine association rule will be described

in third section. The evaluation on experimental PC cluster will given in fourth sec

tion. While fifth section shows how currently available parallel commercial RDBMS

performs with the queries.

2 Association Rule Mining

A typical example of association rule is “if a customer buys A and B then 90% of

this kind of customers buy also C ”. Here 90% is called the confidence of the rule.

Another measure of a rule is called the support of the rule.

Transactions in a retail database usually consist of an identifier and a set of

items or itemset. {A, B, C} in above example is an itemset. An association rule is an

implication of the form X

Y where X and Y are itemsets. An itemset X has

support s if s % of transactions contain that itemset, here we denote s = support(X) .

The support of the rule X

⇒

Y is support(X

Y) . The confidence of that rule can

be written as the ratio support(X

Y)/support(X) .

The problem of mining association rules is to find all the rules that satisfy a

user-specified minimum support and minimum confidence, which can be

decomposed into two subproblems:

1.

Find all combinations of items, called large itemsets, whose support is greater

than minimum support.

2.

Use the large itemsets to generate the rules.

Since the first step consumes most of processing time, development of mining

algorithms has been concentrated on this step.

3 Association Rule Mining Algorithms on SQL

Most of the algorithms developed to mine association rule was intended to pursuit

effectiveness so somehow they neglect integration with existing system. Some

exception such as SETM 4) reported SQL expression of association rule mining.

The ability to do data mining directly on RDBMS using SQL provides many

benefits among others:

1.

Small implementation cost

Since we can use SQL available on all RDBMS to do data mining, we don't

have to buy expensive data mining software separately. Organizations that

are still considering the introduction of full-scale data mining can easily set

up experimental dataset and test the efficiency of data mining applications

using existing RDBMS capability.

2.

SQL as standard language

The popularity of SQL as standard language for manipulating database

may shorten the time required to implement data mining using SQL.

Nontraditional Database Systems

Search WWH ::

Custom Search

Home