Java Reference
In-Depth Information
data mining process from the perspective of what a consultant should
do for a customer engagement, complete with knowledge- and
solution-transfer to the customer. For smaller-scale data mining
projects, portions of the process may be omitted; however, the general
flow still applies. Having a well-defined and thorough process is a
critical part of a successful data mining strategy.
Also in this chapter, we highlight specific advances that simplify
the traditional data mining process, thereby making data mining
more accessible to application developers. We then highlight those
parts of the data mining process that are supported by Java Data Min-
ing (JDM). Two of the phases, data analysis and data preparation, are
covered in more detail in Section 3.2, followed by a more in-depth
review of the modeling phase. Section 3.5 discusses how the data
mining process fits into enterprise software architectures. Section 3.6
discusses advances in automated data mining that facilitate the over-
all data mining process, and concludes with a discussion of how some
vendors present and integrate data mining into business applications.
A Standardized Data Mining Process
The Cross Industry Standard Process for Data Mining, or CRISP-DM,
was a project to develop an industry- and tool-neutral data mining
process model [CRISP-DM 2006]. The CRISP-DM concept was con-
ceived by DaimlerChrysler (then Daimler-Benz), SPSS (then ISL),
and NCR, in 1996 and evolved over several years, building on
industry experience, both company-internal and through consulting
engagements, and specific user requirements. Although most data
mining projects traditionally had been one-off design and imple-
mentation efforts by highly specialized individuals, they suffered
from budget and deadline overruns. CRISP-DM had as goals to
bring data mining projects to fruition faster and more cheaply. Since
data mining projects that followed ad hoc processes tended to be
less reliable and manageable, by standardizing the data mining
phases and integrating and validating best practices from experts in
diverse industry sectors, data mining projects could become both
reliable and manageable.
We should note that data mining project success depends heavily
on the data available and the quality of that data. As a whole, placing
greater emphasis on current and future data analysis requirements
during system and application design can greatly reduce future data
mining effort. Poor data design and organization poses one of the
greatest challenges to data mining projects.
Search WWH ::

Custom Search