Data Mining Standards
Java Data Mining (JDM) is not the first standard in the data mining
space. The first standard in the data mining space was the Predictive
Model Markup Language (PMML) developed by the Data Mining
Group (DMG), which released version 1.0 in August 1999; the
version is currently at 3.1. This was followed by the CWM data min-
ing extension, developed through the Object Management Group
(OMG), which started in 1999 and released version 1.0 in 2001 [CWM
2005]. The SQL/MM Part 6 Data Mining [SQL/MM DM 2006], part
of the SQL standard from the Joint Technical Committee (JTC 1) of
the International Standards Organization (ISO) [ISO 2006] and the
International ElectroTechnical Commission (IEC) [IEC 2006], began
in 1999 and made its first release in 2002. JDM began as JSR-73, start-
ing in 2000, with its first release in 2004, and a second release is
underway through JSR-247. These standards are discussed to pro-
vide a broader context for understanding JDM and the ways some of
these standards may be used in combination with JDM.
Predictive Model Markup Language
PMML is an XML markup language for describing both statistical
and data mining models. Its primary goal is to enable interchange of
data mining models between systems as well as between vendor
implementations. PMML supports the description of data mining
model input (e.g., required data fields), the transformations neces-
sary to prepare the data for scoring, as well as the parameters that
define the data mining model [DMG 2005].
PMML emerged in 1997 because the need to exchange data
mining models in a vendor-neutral format was viewed as important
to moving the industry forward. PMML has evolved over the
ensuing years, with successive releases becoming more precise in
the definitions of data mining terminology and their use in specific
types of models. Initially, PMML defined a set of the most common
representations on which participating vendors could agree.
PMML is developed by the DMG—an independent, vendor-led
group that develops data mining standards. The DMG is a voting-
based organization; voting determines both direction and content.
Membership consists of voting members and associate members.
Generally, extensions and modification to the PMML standard come
from one or more members taking the lead to modify a component of