Databases Reference
In-Depth Information
and included them into large itemset table C_k. Finally transaction data R_k of
length k generated by matching items in candidate itemset table RTMP_k with
items in large itemsets.
4.2 Enhanced SETM query using view materialize technique
SETM has to materialize its temporary tables namely R_k and RTMP_k. Those
temporary tables are only required in the next pass and they are not needed for
generating the rules. In fact, those tables can be deleted after execution of its
subsequent pass. Based on this observation we could avoid materialization cost of
those temporary tables by replacing the table creation with view.
4.3 Enhanced SETM query using subquery technique
We expect significant performance improvement with utilization of view, however
view still requires time to access the system catalog and are holding locks to the
system catalog table during creating views so we further use subquery instead of
temporary tables. Therefore we embed the generation of item combinations into
the query to generate large itemsets.
4.4 Apriori SQL
Sarawagi et.al. proposed SQL query to mine association rule that is based on
Apriori algorithm 11) . We omit the detail of the query due to space limitation.
The query differs from SETM since it first generate the candidate of large itemsets
before support counting process. The candidate table at pass k is generated by
joining two copies of large itemsets with length ( k -1) from previous pass. The join
result which is a set of k -itemsets is further pruned using the subset pruning strategy
that all subsets of a large itemset should be frequent.
They also propose some methods to do support counting such as K-way join,
3-way join, 2Groupby and Subquery. The Subquery method is reported to have
best performance.
However they also pointed out that the performance of pure SQL-92
implementations is far behind their counterparts such as native programs or OODB
based queries that utilyze user defined function(UDF).
4.5 Set-oriented Apriori
After investigation of several execution plans of Apriori SQL, some modifications
are proposed to improve the performance 12) . The modifications are: pruning non-
frequent items from the transaction database after the first pass, elimination of
candidate generation in second pass since the number of candidates are too many
to materialize and reusing the item combinations from previous pass in the similar
way as the SETM.
Search WWH ::




Custom Search