Uniform Resource Identifiers for Datasets
Input data, in the form of build, test, or apply datasets, for data
mining can come in a wide variety of formats and locations. Struc-
tured data for data mining often comes from databases as tables or
views, but may also reside in a flat files in a file system in formats
like CSV, Excel file, etc. To support various formats and locations,
JDM defines a data location as a simple Uniform Resource Identifier
(URI) string. JDM recommends using a URI string in conformance
with the specification defined by RFC 2396: Uniform Resource Identifi-
ers (URI): Generic Syntax , amended by RFC 2732: Format for Literal
IPv6 Addresses in URLs .
For vendors who support data mining in the database, where the
database itself acts as the DME, users can specify the URI for data as
a table or view name, since the connection already authenticates and
connects the user to a database schema. Most database vendors sup-
port schemas within the database and remote database links; in those
cases they can use schemaName.tableName to identify tables or views
in other schemas. If a schema name is not specified, the data is con-
sidered to be in the user's local schema.
For vendors who support data mining outside the database, but
mine data accessing from the database, users can leverage a JDBC
URL as the URI specification. Section 16.2.5 gives some examples of
these URIs. For vendors who support file input, users can leverage
the file URI, for example, file:///C:/DMData .
JDM uses type-safe enumerated types to define the possible values
for arguments and settings. For example, to list the possible values of
mining functions, JDM uses the javax.datamining.MiningFunction enu-
merated type. MiningFunction defines the following enumerated val-
ues that can be used in the API: association, attributeImportance,
classification, clustering, regression .
Enumerations are defined as classes to provide standard imple-
mentation for enumerations and also to be able to easily migrate to
the J2SE 5.0 Enum classes. Since JDM 1.0 and 1.1 must be compatible
with J2SE 1.4 and above, JDM does not use the J2SE 5.0 Enum and
other new language features. Instead, it defines a JDM-specific Enum
class that is compatible with J2SE 5.0.