Data Mining Process - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

JDM distinguishes between data that is prepared and data that is

unprepared . Data miners may specify that their data is already pre-

pared, perhaps through various extraction, transformation, and load

(ETL) tools, and that the data mining tool should not transform it fur-

ther. For example, if a user already normalized a data attribute—per-

haps the range of attribute age between 10 and 90 has been mapped

to values between 0 and 1—the data mining tool typically should not

normalize it again. Alternatively, users may specify that some data

attributes are unprepared, meaning that the tool should perform

transformations it deems appropriate. JDM 2.0 further extends sup-

port for data preparation by including a framework and an explicit

interface for performing common data mining transformations.

3.1.4

Modeling Phase

Once a dataset is sufficiently prepared, the modeling phase begins.

Practitioners often consider this phase the “fun part.” Here, the user

gets to specify settings for mining functions, and if more control is

desired, the user can further select algorithms and their specific set-

tings for building models. These settings can be automatically tuned

by the data mining tool, or tuned explicitly by the user. Since there

are many possible algorithms or techniques for a given problem,

users may try several to determine which produces the best result.

Some mining algorithms may have specific data preparation require-

ments. As such, users may switch back and forth between the model-

ing and data preparation phase.

Also included in the modeling phase is model assessment.

Normally, a data mining tool will produce some model for almost any

data thrown at it, whether or not there are any meaningful knowl-

edge or patterns in the data. To safeguard against this, users can test

supervised models, that is, those supporting classification and

regression. On unsupervised models, like association and clustering,

users can inspect the models to determine if the results are meaning-

ful. For example, are the clusters defined in a clustering model help-

ful in understanding customer segments, or are these segments

different enough to develop a marketing strategy around them? We

explore the details of the modeling phase further in Section 3.3.

JDM provides extensive support for the modeling phase. For

those users new to data mining, they can specify problems at the

mining function level. In this case, the data mining tool is responsible

for selecting an appropriate algorithm and corresponding algorithm

Search WWH ::

Custom Search

Home