Data Mining of Association Rules and the Process of Knowledge Discovery in Databases - Advances in Data Mining

Information Technology Reference

In-Depth Information

the nontrivial process of identifying valid, novel, potentially useful, and

ultimately understandable patterns in data. [10]

In this context pattern is meant in a very general way. A pattern is whatever

a data mining algorithm may find in or generate from the data, e.g. a model

that scores customers based on a decision tree or based on a neural network,

a clustering of the data, or a set of association rules. Whereas the demand for

validity, novelty, usefulness and understandability of these patterns is ultimately

clear, the implications of the term “nontrivial process” might not be obvious at

first glance and are worth a deeper look.

3.1 The Phases of the KDD Process

A KDD process consists of several tasks. Indeed, the actual mining, that is

to say the application of a data mining algorithm to a dataset, is only one of

these steps. Following the CRISP-Data Mining model [9,31] we distinguish the

following tasks:

1. Business Understanding

The very first step of a KDD project should be a close look from the business

point of view. The goal of this phase is to gain a deeper understanding of

the project objectives and further circumstances strictly from the business

perspective. Finally the insights from this initial phase are to be turned into

a data mining problem definition.

2. Data Understanding

Based on the results from the business point of viewthe second step is to

get familiar with the available data. The goal is to understand the attributes

and the corresponding attribute values and to find out hidden semantics

possibly in the data. Furthermore at this stage one should figure out what

exactly the available data offers. That is to say, whether it has the potential

to answer our mining questions or not, and if possible to select promising

subsets of the data.

3. Data Preparation

The next step is to construct the dataset where the mining algorithm is to be

run on. This phase covers both syntactic aspects - format transformations

for the employed mining algorithm - and semantic aspects like table, record

and attribute selection. Last but not least this phase also includes deriving

newattributes that contain higher information only implicitly contained in

the rawdata (e.g. deriving “day of the week” from “date”).

4. Modeling (or Mining)

In the modeling phase the actual data mining takes place. Based on the iden-

tified business goals and the assessment of the available data an appropriate

mining algorithm is chosen and run on the prepared data.

5. Evaluation

Evaluating the results of the mining run mainly covers three aspects. First of

all, it is necessary to ensure whether everything went right from the technical

point of view. Was the mining algorithms finally able to read and interpret

the prepared dataset correctly? Were all designated information actually

Advances in Data Mining

Search WWH ::

Custom Search

Home