Data Mining Process - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

better at automating much of the data preparation required by

algorithms. Advances in algorithms also include automated settings

tuning to obtain optimal models relative to the data provided. Some

tools even select the “best” model from a set of candidate models.

However, the process of defining the business problem, selecting the

mining function to be used, and ensuring that suitable data exists

and is coalesced into a dataset for mining cannot currently be auto-

mated. People knowledgeable in the domain must lead the charge in

these areas.

Another advance in automated data mining is in the area of

guided analytics, which typically consists of a wizard-driven graphi-

cal user interface. Such an interface systematically prompts the user

for information needed to define a data mining problem and then

guides the user through the various data mining steps, for example,

sampling, outlier treatment, algorithm-specific transformations,

building, testing or assessing, and applying. Tools supporting

guided analytics allow business analysts and non-data mining

experts to obtain reasonable results.

The JDM interface enables vendors to provide a great deal of auto-

mation for users. The separation of mining function from mining

algorithm allows vendors to intelligently select an algorithm and cor-

responding settings based on the problem and data provided. Many

of the settings for both functions and algorithms include an option

systemDetermined . This instructs the DME to determine the most

appropriate setting value automatically.

3.7

Summary

In this chapter, we introduced the CRISP-DM standard data mining

process and characterized how JDM supports the various phases of

this process. We then looked at data analysis and preparation in

greater detail exploring what to look for in data and how to address

typical data quality issues. Since modeling is the main focus of JDM,

we explored three principal tasks—model build, test, and apply.

In preparation for the discussion on enterprise software architec-

tures, we discussed the role of databases and data warehouses on

data mining. We characterized the architectures of data mining tools

and their interplay with file systems and databases. We then looked

at a larger scale enterprise system involving data mining and how

workflow can be used to include mining tasks in the enterprise.

Search WWH ::

Custom Search

Home