Databases Reference
In-Depth Information
project, and deliverables have to be defined. Although deliverables have been
considered in CRISP-DM, they have been informally defined and consequently
no comparison of output and nonformal evaluation are possible. Regarding the
lifecycle, in this chapter we propose a first approach to Data Mining project
phases and lifecycle based on the ones defined by RUP. We also propose a
first approach towards abstraction for the project conception step.
The rest of the chapter has been organized as follows. Section 2 presents
related work and review advances in Data Mining standardization and ap-
proaches to data mining methodology. We also review RUP as representative
of software development methodologies. In Sect. 3 we define a data mining
project. In particular we make a first approach to the definition of steps and
lifecycle. In Sect. 4 we focus on the first phase of a Data Mining project,
namely what we define as the project conception, to properly define a data
mining project plan. Section 5 presents preliminary conclusions and outlook.
2 Related Work
In the information age when data generated and stored by modern orga-
nizations increase in an extraordinary way, data mining tasks [9] become a
necessary and fundamental technology. A lot of data mining research has been
focusing on the development of algorithms for performing different tasks, i.e.
clustering, association and classification [1,2,5,13,15,16,19,20,24,28,30], and
on their applications to diverse domains. One major challenge in data mining,
according to [12], is getting researchers to agree on a common standard for pre-
processing tasks and standards related to applying the data mining process to
operational processes and systems. In this sense, the Predictive Model Markup
Language (PMML) [8] provides several components (Data Dictionary, Min-
ing Schema, Transformation Dictionary, Models) useful for producing data
mining models. The Data Dictionary includes only information about type of
data and range of values. Semantic information is not taken into account.
Several proposals have been developed in order to offer a guide for imple-
menting data mining projects [7, 22, 27].
The Common Warehouse Model for Data Mining (CWM DM) [22] pro-
posed by the Object Management Group, introduces a CWM Data Mining
metamodel integrated by the following conceptual areas: a core Mining meta-
model and metamodels representing the data mining subdomains of Clus-
tering, Association Rules, Supervised, Classification, Approximation, and
Attribute Importance.
The Cross-Industry Standard Process for Data Mining (CRISP-DM), was
proposed in 1997 [7] in order to establish the standard data mining process.
CRISP-DM steps include several processes:
Business Understanding focuses on understanding the project objectives
and requirements from the business perspective, then converting this
Search WWH ::




Custom Search