better at automating much of the data preparation required by
algorithms. Advances in algorithms also include automated settings
tuning to obtain optimal models relative to the data provided. Some
tools even select the “best” model from a set of candidate models.
However, the process of defining the business problem, selecting the
mining function to be used, and ensuring that suitable data exists
and is coalesced into a dataset for mining cannot currently be auto-
mated. People knowledgeable in the domain must lead the charge in
Another advance in automated data mining is in the area of
guided analytics, which typically consists of a wizard-driven graphi-
cal user interface. Such an interface systematically prompts the user
for information needed to define a data mining problem and then
guides the user through the various data mining steps, for example,
sampling, outlier treatment, algorithm-specific transformations,
building, testing or assessing, and applying. Tools supporting
guided analytics allow business analysts and non-data mining
experts to obtain reasonable results.
The JDM interface enables vendors to provide a great deal of auto-
mation for users. The separation of mining function from mining
algorithm allows vendors to intelligently select an algorithm and cor-
responding settings based on the problem and data provided. Many
of the settings for both functions and algorithms include an option
systemDetermined . This instructs the DME to determine the most
appropriate setting value automatically.
In this chapter, we introduced the CRISP-DM standard data mining
process and characterized how JDM supports the various phases of
this process. We then looked at data analysis and preparation in
greater detail exploring what to look for in data and how to address
typical data quality issues. Since modeling is the main focus of JDM,
we explored three principal tasks—model build, test, and apply.
In preparation for the discussion on enterprise software architec-
tures, we discussed the role of databases and data warehouses on
data mining. We characterized the architectures of data mining tools
and their interplay with file systems and databases. We then looked
at a larger scale enterprise system involving data mining and how
workflow can be used to include mining tasks in the enterprise.