Java Reference
In-Depth Information
details, but also information such as the descriptive statistics of the
input attributes for display purposes. This is the case for some
clustering implementations, which are able to show the profiles for
all input attributes for all clusters. Other models may maintain cross-
statistics with the targets or all the input attributes.
In trying to optimize business performance, the number of
predictive and descriptive models resulting from data mining
techniques are increasing dramatically. For example, large telecom-
munications operators are now building more than 1,500 models a
year, for their CRM activity alone! And they will expand this meth-
odology later into risk management, which will dramatically
increase the number of models built per year. This impacts the disk
storage needed to persist not only the models, but also their build
settings, apply settings, and various result objects. When building
credit risk scoring models for example, it is not uncommon to have
quality processes in place that require these settings be kept: it may
be mandatory that a specific model's information can be retrieved,
such as when the model was produced, what dataset was used to
produce the model, and what were the test results. Often, a business
wants or needs to keep track of the different model versions. This is
considered an application level operation currently, because JDM
does not include model versioning. As such, users will likely design
naming conventions for their models, tasks, and datasets, and all the
objects that are saved in the mining object repository (MOR) to exter-
nally manage the versioning of persisted objects. Consult your DME
vendor for specifics concerning storage requirements for different
models.
15.4
Data Access
Data access is needed in three major phases of a model's life cycle:
(1) the build phase, where the model is created on a build dataset,
(2) the test phase, where metrics are computed to understand model
quality, and (3) the apply phase, where scores or forecasts are writ-
ten back to storage for later use. In these phases, data must be
accessed from the main repositories: data warehouses, operational
systems, and so on. When the DME is an independent server, there
must be data transfer between the data repository and the DME,
which increases network traffic. If IT management does not allow
mining the data in the actual data warehouses or operational data
stores, in-database and independent-server (direct data access) DME
Search WWH ::




Custom Search