Java Reference
In-Depth Information
of these standards and the need for backward compatibility, such
unification proved difficult, if not impossible. However, through
open communication and cross-pollination between standards
when individuals participated in multiple standards efforts,
progress was made. JDM leveraged concepts and terminology from
PMML, SQL/MM DM, and CWM/DM while recommending con-
cepts, terminology, and realizations back into these standards.
For the second release of JDM, the JSR-247 expert group plans to
address data mining transformations, more advanced statistics, and
the breadth of mining functions. The expert group also plans to
provide a generic interface enabling the specification of settings as
name-value pairs. Chapter 18 discusses these in more detail.
Directions for Data Mining Standards
With all this standardization, what remains to be done? What is
needed in the data mining community? Generally speaking, there are
a few key areas that are not adequately addressed:
Model management
Benchmarks for both model building and apply, with the
emphasis on apply for batch and real-time.
Test suites for conformance and interoperability certifica-
SQL language extensions
Conceptual and terminological convergence
Model management can take on many guises. One perspective
includes the management and analysis of a large collection of data
mining models [Liu/Tuzhilin 2006]. Model management is quickly
becoming an issue of enormous concern for large-scale users of data
mining. The ability to find models meeting certain characteristics or
criteria is key where many users are building and testing many mod-
els, often hundreds or thousands. These criteria include models with
a certain signature (required attributes), accuracy or model quality
characteristics such as error rates, and performance factors such as
model size. Model management also includes the ability to build
many models efficiently as well as to score datasets using many
models efficiently. This is the area in which APIs are often critical to
success because graphical interfaces are often geared toward the
Search WWH ::

Custom Search