Databases Reference
In-Depth Information
association rule mining. The SQL queries to mine association rule will be described
in third section. The evaluation on experimental PC cluster will given in fourth sec
tion. While fifth section shows how currently available parallel commercial RDBMS
performs with the queries.
2 Association Rule Mining
A typical example of association rule is “if a customer buys A and B then 90% of
this kind of customers buy also C ”. Here 90% is called the confidence of the rule.
Another measure of a rule is called the support of the rule.
Transactions in a retail database usually consist of an identifier and a set of
items or itemset. {A, B, C} in above example is an itemset. An association rule is an
implication of the form X
Y where X and Y are itemsets. An itemset X has
support s if s % of transactions contain that itemset, here we denote s = support(X) .
The support of the rule X
Y is support(X
Y) . The confidence of that rule can
be written as the ratio support(X
Y)/support(X) .
The problem of mining association rules is to find all the rules that satisfy a
user-specified minimum support and minimum confidence, which can be
decomposed into two subproblems:
1.
Find all combinations of items, called large itemsets, whose support is greater
than minimum support.
2.
Use the large itemsets to generate the rules.
Since the first step consumes most of processing time, development of mining
algorithms has been concentrated on this step.
3 Association Rule Mining Algorithms on SQL
Most of the algorithms developed to mine association rule was intended to pursuit
effectiveness so somehow they neglect integration with existing system. Some
exception such as SETM 4) reported SQL expression of association rule mining.
The ability to do data mining directly on RDBMS using SQL provides many
benefits among others:
1.
Small implementation cost
Since we can use SQL available on all RDBMS to do data mining, we don't
have to buy expensive data mining software separately. Organizations that
are still considering the introduction of full-scale data mining can easily set
up experimental dataset and test the efficiency of data mining applications
using existing RDBMS capability.
2.
SQL as standard language
The popularity of SQL as standard language for manipulating database
may shorten the time required to implement data mining using SQL.
Search WWH ::




Custom Search