Databases Reference
In-Depth Information
Association rule mining is a kind of mining that is known as CPU power
demanding application. This fact has driven many initial researches in data mining
to develop new efficient mining methods such as Apriori 2) and its improvements 9)
3) . Some algorithms are already available as commercial packages. Most of them
assumes the data is stored in flat file system. However in most case, the data is
managed by RDBMS. Thus one has to export the data from database and perform
the data mining with specialized software outside the database. Some softwares
also provide data access to database using cursor interface 7) .
However RDBMS has sophisticated query processing capability by means of
standard language SQL. Therefore there are some efforts recently to perform data
mining using relational database system which offer advantages such as seamless
integration with existing system and high portability. Some methods examined
ranging from directly using SQL to some extensions like user defined function
(UDF) 11) . Some efforts have been conducted to couple RDBMS more tightly with
association rule mining system. For example DMQL 5) and M-SQL 8) proposed some
SQL standard extensions to handle mining operators.
Pure SQL-92 approach is interesting since SQL-92 is standard supported by
most database system which means it offers the highest level of portability and
flexibility. Unfortunately SQL approach is reported to have drawback in
performance.
We proposed large-scale PC cluster as cost effective platform for data intensive
applications such as data mining using parallel RDBMS, which offers the advantages
of the integration without sacrificing the performance 13) .
There is a tradeoff between performance and portability. Performance is not
necessarily sufficiently high but seamless integration with existing RDBMS would
be considerably advantageous. Since RDB is already very popular, the feasibility
of association rule mining can be explored using query of standard SQL instead of
purchasing expensive mining software. In addition, parallel RDB is now also widely
accepted. We showed that paralleling the SQL execution of modified SETM query
on PC cluster can offer the same performance as those Apriori based native
programs with 4 nodes. Since most organizations have a lot of PCs, which are not
fully utilized. We are able to exploit such resources to enhance the performance
significantly.
On the other hand recently most major commercial database systems have
included capabilities to support parallelization although no report available about
how the parallelization affects the performance of complex query required by
association rule mining. This fact motivated us to examine how efficiently SQL
based association rule mining can be parallelized and speeded up using commercial
parallel database system (IBM DB2 UDB EEE). We propose two techniques to
enhance association rule mining query based on SETM [3]. And we have also
compared the performance with commercial mining tool (IBM Intelligent Miner).
Our performance evaluation shows that we can achieve comparable performance
with commercial mining tool using only 4 nodes.
This paper is composed with 6 sections. In second section we will briefly explain
Search WWH ::




Custom Search