Java Reference
In-Depth Information
building and use of a single or a few models, not hundreds or thou-
sands. Model management also includes the ability to analyze
deployed models, such as those that are underperforming or are
being dominated by other models. Standards need to address these
model management tasks.
The data mining community lacks standard benchmarks, either
for model building or scoring. These benchmarks are needed across
two dimensions: performance (or speed ) and accuracy (or quality ). One
issue in this space is that it can be difficult to compare one vendor's
implementation of a given algorithm against another's since they
likely have dramatic differences in feature sets. For example, a given
vendor's decision tree algorithm may differ on its support for missing
value handling (surrogates), statistics, number of splits at a given
node, and so on. However, progress can be made in defining bench-
marks. Model build performance can be measured for several classes
of mining functions, or even algorithms, based on the accuracy in the
case of supervised models that the model is able to achieve in a given
execution time. More readily comparable are scoring performance
numbers. Using a given hardware platform, for example, how fast
can a DME score a large dataset (e.g., 100 million customers), using
1 model or using 100 models? The accuracy of those scores can be
validated relative to known values, essentially producing a JDM test
metrics object.
One of the concerns for standards supporting interoperability is a
lack of test suites for conformance or interoperability. Test suites for
conformance determine whether the implementation meets the
essence of the standard specification. For example, in JDM, conform-
ance involves providing the required packages, classes, and methods
defined by the standard and ensuring that certain minimal behaviors
are met. Test suites for interoperability determine whether two ven-
dors will be able to exchange models. For example, if a vendor has
proprietary extensions, this will inhibit interoperability. However, if
users and vendors can operate without using those proprietary
extensions, interoperability is achievable. Currently, it is often up to
individual vendors (or their customers) to determine what can be
exchanged and what cannot.
Although SQL/MM DM has been defined, data mining vendors
such as Oracle and Microsoft have chosen to extend the SQL lan-
guage syntax to accommodate their database mining capabilities. As
offerings of database mining mature and continue to extend SQL,
having standard SQL language extensions will simplify the mining
Search WWH ::

Custom Search