Java Reference
In-Depth Information
15.4.2
Data Access for Apply and Test
As noted earlier, a model can be applied to very large datasets,
especially in terms of number of cases. This is why the possibility of
exporting the model from the DME in a language that is supported
by the scoring engine (e.g., a database) can be used. For example,
IBM is proposing PMML interpreters for several databases; Tera-
data also proposes a PMML interpreter; and KXEN is able to export
models either in SQL or in User Defined Functions for all major
databases. In all these situations, the apply phases are done within
the execution environment (e.g., the database) without external
data transfer. This impacts the computing power of the database
servers but it has no impact on network traffic.
In contrast, in-database DMEs, such as Oracle, perform apply and
test at a layer below application-level SQL or User Defined Func-
tions. This type of access eliminates overheads for security, process,
and database data read overheads typically experienced by applica-
tion level code. As a result, in-database scoring can achieve better
performance than externally generated code.
15.5
Backup and Recovery
Backup and recovery plans are critical for any IT organization. IT staff
are already quite familiar with performing database and file system
backups on a regular basis. In-database DMEs or DMEs with database-
hosted MORs make backup and recovery a part of normal database
maintenance. Where models are maintained separately from the data-
base (e.g., as flat files in either PMML or proprietary formats), users
must rely on filesystem backups or ad hoc procedures.
15.6
Scheduling
In an IT environment where hardware is plentiful and software
scales well with the addition of hardware, the consideration of when
to execute data mining tasks can depend more on business require-
ments than on technical ones. However, most IT departments have
budget constraints, both in terms of hardware purchases and person-
nel to manage and maintain their hardware and software. To this
point, scheduling of data mining tasks becomes important to effi-
ciently use existing hardware. This section assumes that the data
Search WWH ::




Custom Search