Java Reference
In-Depth Information
Alternatively, the business analyst could build 1,200 individual
classification models and use these models to score records and
choose the top predictions. Challenges here include building the
models separately (using separate tasks and largely redundant Build-
Settings objects), managing the lifecycle of these 1,200 models, invok-
ing the individual models for scoring (either in real-time or batch),
and, finally, selecting the top predictions among 1,200 results. From
the performance point of view, if all the models are using the same
algorithm and same predictors, it is possible that much of the compu-
tation between the 1,200 model builds is redundant, but cannot be
leveraged across the distinct model builds.
A better alternative is the use of multi-target models; with these,
a single model build task can be executed by specifying multiple
targets in that build. During the build process, the algorithm can
optimize the computations and avoid recomputing the same inter-
mediate results. Similar optimizations may be possible during
model apply. One outcome is overall improved performance.
Another benefit of multi-target models is the convenience of each
target including the other n
1 potential targets as predictors
without having to specify those various combinations explicitly.
Multi-target models also have the opportunity to include subtle
interactions among the target attributes that impact predictions. If
models are built separately on the different targets, they cannot
take this interaction into account.
JDM 2.0 provides multi-target specification as a generalized
supervised mining function that allows the specification of targets
for classification, regression, or both in the same model. All the func-
tionality present for classification and regression (e.g., involving test
metrics or apply settings) is available for multi-target models.
18.7
Text Mining
“Text mining” is defined as follows:
Text mining, also known as intelligent text analysis, text data mining, unstruc-
tured data management, or knowledge-discovery in text (KDT), refers generally
to the process of extracting interesting and non-trivial information and knowledge
(usually converted to metadata elements) from unstructured text (i.e. free text). Text
mining is a young interdisciplinary field that draws on information retrieval, data
mining, machine learning, statistics and computational linguistics. As most infor-
mation (over 80%) is stored as text, text mining is believed to have a high commer-
cial potential value. [Wikipedia 2006]
Search WWH ::




Custom Search