Data Mining of Association Rules and the Process of Knowledge Discovery in Databases - Advances in Data Mining

Information Technology Reference

In-Depth Information

As a result gathering data that mirrors our world has become fairly easy

and rather inexpensive. On the one hand the obtained data collections for sure

contain valuable and detailed information but on the other hand analyzing such

massive datasets turned out to be much harder than expected. In brief, sizes

ranging from tens of megabytes upto several terabytes forbid simply employing

common analysis methods. Consequently during the last ten years specialized

techniques have been developed that can be subsumed under the term data

mining. The main goal behind these methods is to allowthe e - cient analysis

of even very large datasets. With its origins in machine learning, statistics and

databases, data mining has developed to a prospering and very active research

field since the early nineties.

Since its introduction in [2] the task of association rule mining has received

a great deal of attention. Today the generation of association rules is one of

the most popular data mining methods. The idea of mining association rules

originates from the analysis of market-basket data where rules like “A customer

who buys products x 1 ,x 2 ,... ,x n will also buy product y with probability c %”

are generated.

Their direct applicability to business problems together with their inherent

understandability - even for non data mining experts - made association rules

such a popular mining method. Moreover it became clear that association rules

are not restricted to dependency analysis in the context of retail applications

but are successfully applicable to a wide range of business problems.

In this paper we deal with association rules in the context of a complex,

interactive and iterative knowledge discovery process. In Section 2 we formally

introduce association rules and give a first example. Then in Section 3 we draw

the attention to the process of knowledge discovery in databases (KDD) and

describe its basics. At the end of this section we finally explain the implications

on association rule mining. Concerning human involvement and interactivity we

come to the conclusion that today the situation is still not satisfying but there

are several main starting points to cope with this problem:

First of all there is the algorithmic complexity. In brief, the number of rules

grows exponentially with the number of items. Fortunately today's algorithms

are able to e - ciently prune this immense search space based on minimal thresh-

olds for quality measures on the rules. We deal with the details of rule generation

in Section 4.

Second, the mining data is typically stored in a relational database manage-

ment system. Therefore e - cient and elegant integration with modern database

systems is one of the key factors in practical mining applications. The reason is

that simple solutions like flat file extraction of the data quickly reach their limits

in the context of massive datasets and repeated algorithms runs. A solution to

this problem is given in Section 5.

Third, interesting rules must be picked from the set of generated rules. This

might be quite costly because the generated rule sets normally are quite large -

e.g. more than 100 , 000 rules are not uncommon - and in contrast the percentage

of useful rules is typically only a very small fraction. In Section 6 we enhance

Advances in Data Mining

Search WWH ::

Custom Search

Home