Database Reference
In-Depth Information
Notice that MiningInputStream implements the interface MiningVectorSet
which is the most general container of mining vectors. MiningVectorSet contains
only the method getMetaData to access the basis of the attribute space. MiningIn-
putStream as container of countable sets of mining vectors is the most important
implementation of MiningVectorSet. However, some applications like reinforce-
ment learning (Sect. 12.2.2 ) require access to uncountable sets of mining vectors
such as all points of a domain or all unit vectors starting from one point. In this case,
the representation of the mining vector set requires a more abstract level such as by
functions of the boundary or geometric objects. We will return to this topic in
Sect. 12.2.2 ; at this point, we mention that MiningInputStream is sufficient for data
mining applications.
MiningInputStream contains a graded spectrum of data access methods
depending on its implementation. In the simplest case, the data matrix can be
traversed only once using a cursor-based approach using the method next . If the
reset method is supported, the cursor can be set at the initial position. This access
type is often supported by files and databases. In a more comfortable case, the cursor
can be moved arbitrary using the move method (e.g., for databases supporting
JDBC 2.0). Even more comfortable is the direct access to the data array of the
data matrix, if the matrix fits into memory (e.g., class MiningStoredData ).
The read method returns the mining vector at the current cursor position. Each
full implementation of MiningInputStream must at least support the next and read
methods. In addition, MiningInputStream may implement the interface MiningOut-
putStream to write data to the data source. Each mining input stream is reflective:
the method getSupportedStream returns all data access (and update) methods
supported by the current implementation.
The mining input stream concept is a direct consequence of the fact that almost
each data mining algorithm requires a data matrix as input. In the language of
CWM, we would say: the logical model of data mining is of the Classifier type.
Thus, MiningInputStream extends the CWM class Class.
The physical model describes the physical data source that is used for mining,
like a text file or a database. For the data mining process, the physical model must
be mapped to the logical one.
The physical model describes the physical data source that is used for mining,
like a text file or a database. For the data mining process, the physical model must
be mapped to the logical one Fig. 12.7 .
In XELOPES, this mapping is done by subclassing: different types of physical
data sources can be accessed through different mining input stream classes that
extend MiningInputStream . Important stream classes of XELOPES are listed in
Table 12.1 . Often, it is useful to write own resource classes which extend MiningIn-
putStream or one of its subclasses.
Notice that the last three streams are composed streams which take an arbitrary
mining input stream as input and apply a transformation and multidimensional
selection/ordering to the stream, respectively.
Search WWH ::




Custom Search