Database Reference
In-Depth Information
Notice that
MiningInputStream
implements the interface
MiningVectorSet
which is the most general container of mining vectors.
MiningVectorSet
contains
only the method
getMetaData
to access the basis of the attribute space.
MiningIn-
putStream
as container of countable sets of mining vectors is the most important
implementation of
MiningVectorSet.
However, some applications like reinforce-
ment learning (Sect.
12.2.2
) require access to uncountable sets of mining vectors
such as all points of a domain or all unit vectors starting from one point. In this case,
the representation of the mining vector set requires a more abstract level such as by
functions of the boundary or geometric objects. We will return to this topic in
Sect.
12.2.2
; at this point, we mention that
MiningInputStream
is sufficient for data
mining applications.
MiningInputStream
contains a graded spectrum of data access methods
depending on its implementation. In the simplest case, the data matrix can be
traversed only once using a cursor-based approach using the method
next
. If the
reset
method is supported, the cursor can be set at the initial position. This access
type is often supported by files and databases. In a more comfortable case, the cursor
can be moved arbitrary using the
move
method (e.g., for databases supporting
JDBC 2.0). Even more comfortable is the direct access to the data array of the
data matrix, if the matrix fits into memory (e.g., class
MiningStoredData
).
The
read
method returns the mining vector at the current cursor position. Each
full implementation of
MiningInputStream
must at least support the
next
and
read
methods. In addition,
MiningInputStream
may implement the interface
MiningOut-
putStream
to write data to the data source. Each mining input stream is reflective:
the method
getSupportedStream
returns all data access (and update) methods
supported by the current implementation.
The mining input stream concept is a direct consequence of the fact that almost
each data mining algorithm requires a data matrix as input. In the language of
CWM, we would say: the logical model of data mining is of the
Classifier
type.
Thus,
MiningInputStream
extends the CWM class
Class.
The physical model describes the physical data source that is used for mining,
like a text file or a database. For the data mining process, the physical model must
be mapped to the logical one.
The physical model describes the physical data source that is used for mining,
like a text file or a database. For the data mining process, the physical model must
be mapped to the logical one Fig.
12.7
.
In XELOPES, this mapping is done by subclassing: different types of physical
data sources can be accessed through different mining input stream classes that
extend
MiningInputStream
. Important stream classes of XELOPES are listed in
Table
12.1
. Often, it is useful to write own resource classes which extend
MiningIn-
putStream
or one of its subclasses.
Notice that the last three streams are composed streams which take an arbitrary
mining input stream as input and apply a transformation and multidimensional
selection/ordering to the stream, respectively.