Data Mining of Association Rules and the Process of Knowledge Discovery in Databases - Advances in Data Mining

Information Technology Reference

In-Depth Information

given to the algorithm? Etc. Second, one needs to investigate whether the

mining results are sound from the mining methods point of view. Some

methods directly support this decision by computing certain significance

measures whereas others leave this aspect completely to the analyst and his

experience. Third, a key objective of the evaluation phase is to determine if

all important business issues have been considered adequately.

6. Deployment

After mining the data and assessing the data mining results one needs to

transfer the results back into the business environment. This can be rather

straight forward like preparing the results in form of a report that is under-

standable by business people (who of course typically are non data mining

experts). Or, as the other extreme, can be quite complex like implementing

a repeatable data mining process across the enterprise.

Although there is a broad number of competing process descriptions, e.g. [1,

4,8,5,9,10,30,32], all agree concerning the basic character of the KDD process:

KDD is by no means a push button technology. That is to say, the analyst never

walks strictly through the preprocessing tasks, mines the data, and then analyzes

and deploys the results. Rather, knowledge discovery is complex, iterative and

highly interactive. In each of the phases sketched above it is the analyst as a

human being who decides whether to proceed to the next phase, to redo the

current phase or even to step back to one of the former phases. In Figure 1 the

most important of these interdependencies between the phases are indicated by

arrows. The cycle around the process indicates the overall cyclic character of a

KDD process.

Obviously the analysts creativity and experience have a major part in such a

human centered process. Of course this nontrivial character of the KDD process

pushes constraints on the employed data mining methods.

3.2 Association Rule Mining and the KDD Process

The key to human involvement is to enable analysts to interact easily with both,

data and mining results. To illustrate this point further, let's look at a concrete

example from dependency analysis on the features of cars: in the beginning an

analyst's goal is to obtain a general “feeling” for the data. The issued mining

queries are not focused but try to capture the whole available search space. As a

consequence resulting rule sets are typically rather huge and overtax the analyst

easily. Upto several ten thousand rules are not uncommon. In the initial phase

the analyst identifies promising starting points for his further investigations. The

challenge is to do this on the basis of rule sets containing a great portion of noise,

trivial rules, or otherwise uninteresting associations.

After this orientation phase the analyst decides typically to take a closer

look at a subset of the vehicle features. For example, he focuses on rules that

containspecialequipmenttogetherwithinformationontheenginetypeinstalled.

He lowers thresholds for some rule quality measures, implying a rerun of the

algorithm. The results are not as expected, and the analyst suspects that the

results might be more convincing if the algorithm is only applied to a subset of

Advances in Data Mining

Search WWH ::

Custom Search

Home