Preview of Java Data Mining 2.0 - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

Alternatively, the business analyst could build 1,200 individual

classification models and use these models to score records and

choose the top predictions. Challenges here include building the

models separately (using separate tasks and largely redundant Build-

Settings objects), managing the lifecycle of these 1,200 models, invok-

ing the individual models for scoring (either in real-time or batch),

and, finally, selecting the top predictions among 1,200 results. From

the performance point of view, if all the models are using the same

algorithm and same predictors, it is possible that much of the compu-

tation between the 1,200 model builds is redundant, but cannot be

leveraged across the distinct model builds.

A better alternative is the use of multi-target models; with these,

a single model build task can be executed by specifying multiple

targets in that build. During the build process, the algorithm can

optimize the computations and avoid recomputing the same inter-

mediate results. Similar optimizations may be possible during

model apply. One outcome is overall improved performance.

Another benefit of multi-target models is the convenience of each

target including the other n

1 potential targets as predictors

without having to specify those various combinations explicitly.

Multi-target models also have the opportunity to include subtle

interactions among the target attributes that impact predictions. If

models are built separately on the different targets, they cannot

take this interaction into account.

JDM 2.0 provides multi-target specification as a generalized

supervised mining function that allows the specification of targets

for classification, regression, or both in the same model. All the func-

tionality present for classification and regression (e.g., involving test

metrics or apply settings) is available for multi-target models.

18.7

Text Mining

“Text mining” is defined as follows:

Text mining, also known as intelligent text analysis, text data mining, unstruc-

tured data management, or knowledge-discovery in text (KDT), refers generally

to the process of extracting interesting and non-trivial information and knowledge

(usually converted to metadata elements) from unstructured text (i.e. free text). Text

mining is a young interdisciplinary field that draws on information retrieval, data

mining, machine learning, statistics and computational linguistics. As most infor-

mation (over 80%) is stored as text, text mining is believed to have a high commer-

cial potential value. [Wikipedia 2006]

Search WWH ::

Custom Search

Home