A Case-Based Data Mining Platform - Data Mining: Theory, Methodology, Techniques, and Applications

Database Reference

In-Depth Information

Data mining practice in industry heavily depends on experienced data mining

professionals to provide solutions. For the rarity of data mining professionals, data

mining practice has become quite expensive and time-consuming.

In this paper, we propose a case-based data mining platform. It makes use of the

knowledge captured in past data mining cases to formulate semi-automatic data

mining solutions for typical business problems. Knowledge reuse is the key to this

case-based data mining platform. In order for knowledge reuse, we should concern

the issues, such as, what is the reusable knowledge in data mining process, how to

represent the reusable knowledge, and how to take the reusable knowledge into use.

In the remainder of this paper, we will first discuss the extensions of generic data

mining model for knowledge reuse in Section 2. We will define data mining case in

Section 3. In Section 4, we will have a look on this case based data mining platform

on its storage base, functional modules, user interface, and application scenario. In the

last section, we will give a brief conclusion.

2 Extending Data Mining Model for Knowledge Reuse

Data mining, as a technique, has been investigated for several decades. The generic

data mining model can be simply described as using historical data to generate useful

model. This generic model has often been extended for certain purposes or in certain

application domains. For example, Kotasek and Zendulka [6] have taken domain

knowledge into consideration in their data mining model, the MSMiner [11] has

integrated ETL and data warehouse into its data mining model, and the CWM [8] has

treated data mining as one of its analysis functions. Here, in order for knowledge

reuse, we also need to extend this generic data mining model.

The first extension is to relax the algorithms resided in data mining system. That is,

data mining algorithms can be externally implemented and can be called by a data

mining system. Actually, this kind of extension has been widely implemented in data

mining library such as visual basic data mining library [12] and WEKA [14]. The

purpose that we recall it here is to show the roadmap of our model's extensions.

Meanwhile, in order to relax the dependence of data mining system with its input and

output, we use a data base to externally store its input data, and a model base to

externally store its output models. Thus, a data mining system has associated a data

storage base, an algorithm storage base, and a model storage base.

The second extension is to use processing flows generated in past data mining

solutions to solve new similar problems. Even though data mining, as a whole, has its

well-understood processing steps, a concrete data mining's processing flow may vary

with others when they belong to different industry types, or they have different data

mining tasks, or they have different expectations on output model. For example, the

process of building a customer classification model for automobile industry may be

quite different with the process of building a prediction model for telecommunication

industry. This kind of processing flow shows the information, such as, what data have

been used in the process, what operators have been involved, what model(s) has been

generated, and most importantly, how these data, operators, and model(s) are

connected in a sequence. On the contrary, to the applications which have the same

industry type, the same data mining task, and the same expectation on output model,

Data Mining: Theory, Methodology, Techniques, and Applications

Search WWH ::

Custom Search

Home