The Impact of JDM on IT Infrastructure - Java Data Mining: Strategy, Standard, and Practice

Java Reference

In-Depth Information

may have to address the data staging issue, or introduce alternative

hardware architectures (e.g., clusters or data segregation) to ensure

reasonable non-interference with daily operations.

While the build task uses the data access layer embedded within

the DME itself, there are other ways to perform the apply task and per-

haps the test task; we will look at these tasks separately. Besides the

architectural constraints, there is also the administration environment

to consider.

15.4.1

Data Access for Model Building

As stated earlier, there are three types of DME architecture: in-database

DME and two different layouts of independent-server DME. The

in-database architecture does not require any data transfer since the

algorithms exist where the data reside. The independent-server

architectures requires data transfer, and there are two possibilities

in this case: (1) either the DME implementation requires a copy of

the data in a temporary or proprietary format, which implies a

duplication of data together with additional disk space, or (2) the

DME does not requires temporary or proprietary storage and

accesses the data directly from the repository—this generates more

data traffic but does not require additional disk space, and reduces

data latency issues. However, the second case either requires large

RAM to hold the data or efficient mining techniques to retrieve and

process data in manageable chunks.

In most cases, the build dataset is of smaller size than the datasets

used as input for apply. This can be attributed to one of two reasons.

Either (1) the data is known only for a population concerned with an

experience, which is generally reduced for cost reasons, or (2) robust

models can be safely built on a sample of the entire population,

resulting in smaller build times for similar model quality and robust-

ness. In many practical situations, data access for the build phase is

less demanding than for apply phase.

But the architecture of the DME and the volume of data are not

the only points to take into account. As noted earlier, the policy of IT

management against the use of data in the operational data environ-

ment may impact data access and force the staging of data to isolate

the production environment from the modeling environment.

Search WWH ::

Custom Search

Home