Database Reference
In-Depth Information
Data mining practice in industry heavily depends on experienced data mining
professionals to provide solutions. For the rarity of data mining professionals, data
mining practice has become quite expensive and time-consuming.
In this paper, we propose a case-based data mining platform. It makes use of the
knowledge captured in past data mining cases to formulate semi-automatic data
mining solutions for typical business problems. Knowledge reuse is the key to this
case-based data mining platform. In order for knowledge reuse, we should concern
the issues, such as, what is the reusable knowledge in data mining process, how to
represent the reusable knowledge, and how to take the reusable knowledge into use.
In the remainder of this paper, we will first discuss the extensions of generic data
mining model for knowledge reuse in Section 2. We will define data mining case in
Section 3. In Section 4, we will have a look on this case based data mining platform
on its storage base, functional modules, user interface, and application scenario. In the
last section, we will give a brief conclusion.
2 Extending Data Mining Model for Knowledge Reuse
Data mining, as a technique, has been investigated for several decades. The generic
data mining model can be simply described as using historical data to generate useful
model. This generic model has often been extended for certain purposes or in certain
application domains. For example, Kotasek and Zendulka [6] have taken domain
knowledge into consideration in their data mining model, the MSMiner [11] has
integrated ETL and data warehouse into its data mining model, and the CWM [8] has
treated data mining as one of its analysis functions. Here, in order for knowledge
reuse, we also need to extend this generic data mining model.
The first extension is to relax the algorithms resided in data mining system. That is,
data mining algorithms can be externally implemented and can be called by a data
mining system. Actually, this kind of extension has been widely implemented in data
mining library such as visual basic data mining library [12] and WEKA [14]. The
purpose that we recall it here is to show the roadmap of our model's extensions.
Meanwhile, in order to relax the dependence of data mining system with its input and
output, we use a data base to externally store its input data, and a model base to
externally store its output models. Thus, a data mining system has associated a data
storage base, an algorithm storage base, and a model storage base.
The second extension is to use processing flows generated in past data mining
solutions to solve new similar problems. Even though data mining, as a whole, has its
well-understood processing steps, a concrete data mining's processing flow may vary
with others when they belong to different industry types, or they have different data
mining tasks, or they have different expectations on output model. For example, the
process of building a customer classification model for automobile industry may be
quite different with the process of building a prediction model for telecommunication
industry. This kind of processing flow shows the information, such as, what data have
been used in the process, what operators have been involved, what model(s) has been
generated, and most importantly, how these data, operators, and model(s) are
connected in a sequence. On the contrary, to the applications which have the same
industry type, the same data mining task, and the same expectation on output model,
Search WWH ::




Custom Search